2010_gedik_protecting location privacy with personalized k-anonymity

8/7/2019 2010_Gedik_Protecting Location Privacy With Personalized K-Anonymity

1/18

Protecting Location Privacy with Personalizedk-Anonymity: Architecture and Algorithms

Bu!gra Gedik, Member, IEEE, and Ling Liu, Senior Member, IEEE

AbstractContinued advances in mobile networks and positioning technologies have created a strong market push for location-based

applications. Examples include location-aware emergency response, location-based advertisement, and location-based

entertainment. An important challenge in the wide deployment of location-based services (LBSs) is the privacy-aware management of

location information, providing safeguards for location privacy of mobile clients against vulnerabilities for abuse. This paper describes a

scalable architecture for protecting the location privacy from various privacy threats resulting from uncontrolled usage of LBSs. This

architecture includes the development of a personalized location anonymization model and a suite of location perturbation algorithms.

A unique characteristic of our location privacy architecture is the use of a flexible privacy personalization framework to support location

k-anonymity for a wide range of mobile clients with context-sensitive privacy requirements. This framework enables each mobile client

to specify the minimum level of anonymitythat it desires and the maximum temporal and spatial tolerancesthat it is willing to accept

when requesting k-anonymity-preserving LBSs. We devise an efficient message perturbation engineto implement the proposed

location privacy framework. The prototype that we develop is designed to be run by the anonymity serveron a trusted platform and

performs location anonymization on LBS request messages of mobile clients such as identity removal and spatio-temporal cloaking of

the location information. We study the effectiveness of our location cloaking algorithms under various conditions by using realistic

location data that is synthetically generated from real road maps and traffic volume data. Our experiments show that the personalizedlocation k-anonymity model, together with our location perturbation engine, can achieve high resilience to location privacy threats

without introducing any significant performance penalty.

Index Termsk-anonymity, location privacy, location-based applications, mobile computing systems.

1 INTRODUCTION

IN his famous novel 1984 [1], George Orwell envisioned aworld in which everyone is being watched, practically atall times and places. Although, as of now, the state of affairshas not come to such totalitarian control, projects like

DARPAs recently dropped LifeLog [2], which has stimu-lated serious privacy concerns, attest that continuouslytracking where individuals go and what they do is not onlyin the range of todays technological advances but alsoraises major personal privacy issues, regardless of the manybeneficial applications that it can provide.

According to the report by the Computer Science andTelecommunications Board in IT Roadmap to a GeospatialFuture [3], location-based services (LBSs) are expected toform an important part of the future computing environ-ments that will be seamlessly and ubiquitously integratedinto our lives. Such services are already being developedand deployed in the commercial and research worlds. For

instance, the NextBus [4] service provides location-basedtransportation data, the CyberGuide [5] project investigatescontext-aware location-based electronic guide assistants,and the Federal Communications Commission (FCC)sPhase II E911 requires wireless carriers to provide precise

location information within 125 m in most cases foremergency purposes [6].

1.1 Location Privacy Risks

Advances in global positioning and wireless communica-tion technologies create new opportunities for location-based mobile applications, but they also create significantprivacy risks. Although, with LBSs, mobile clients canobtain a wide variety of location-based information ser-vices, and businesses can extend their competitive edges inmobile commerce and ubiquitous service provisions, theextensive deployment of LBSs can open doors for adver-saries to endanger the location privacy of mobile clients andto expose LBSs to significant vulnerabilities for abuse [7].

A major privacy threat specific to LBS usage is thelocation privacy breaches represented by space or time-correlated inference attacks. Such breaches take place whena party that is not trusted gets access to information that

reveals the locations visited by the individual, as well as thetimes during which these visits took place. An adversarycan utilize such location information to infer details aboutthe private life of an individual such as their politicalaffiliations, alternative lifestyles, or medical problems [8] orthe private businesses of an organization such as newbusiness initiatives and partnerships.

Consider a mobile client which receives a real-time trafficand roadside information service from an LBS provider. If auser submits her service request messages with rawposition information, the privacy of the user can becompromised in several ways, assuming that the LBSproviders are not trusted but semihonest. For instance, if

the LBS provider has access to information that associates

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 7, NO. 1, JANUARY 2008 1

. B. Gedik is with the IBM Thomas J. Watson Research Center, 19 SkylineDr., Hawthorne, NY 10532. E-mail: [email protected].

. L. Liu is with the College of Computing, Georgia Institute of Technology,801 Atlantic Drive, Atlanta, GA 30332. E-mail: [email protected].

Manuscript received 2 Nov. 2005; revised 19 May 2006; accepted 21 Mar.2007; published online 2 Apr. 2007.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-0317-1105.

Digital Object Identifier no. 10.1109/TMC.2007.1062.1536-1233/08/$25.00 2008 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

1

2


2/18

location with identity, such as person Alives in location L,and if it observes that all request messages within location Lare from a single user, then it can infer that the identity ofthe user requesting the roadside information service is A.Once the identity of the user is revealed, further tracking offuture positions can be performed by using a simpleconnect-the-dots approach. In the literature, this type ofattack is called Restricted Space Identification [8]. Anotherpossible attack is to reveal the users identity by relatingsome external observation on location-identity binding to amessage. For instance, if person Awas reported to visitlocation L during time interval , and if the LBS providerobserved that all request messages during time interval came from a single user within location L, then it can inferthat the identity of the user in question is A. The secondtype of attack is called Observation Identification [8] in theliterature.

In order to protect the location information from thirdparties that are semihonest but not completely trusted, wedefine a security perimeter around the mobile client. In thispaper, the security perimeter includes the mobile client ofthe user, the trusted anonymity server, and a secure channelwhere the communication between the two is securedthrough encryption (see Fig. 1). By semihonest we mean thatthe third-party LBS providers are honest and can correctlyprocess and respond to messages, but are curious in that

they may attempt to determine the identity of a user basedon what they see, which includes information in thephysical world that can lead to location-identity binding orassociation. We do not consider cases where LBS providersare malicious. Thus, we do not consider the attack scenariosin which the LBSs can inject a large number of colludingdummy users into the system.

1.2 Architecture Overview

We show the system architecture in Fig. 1. Mobile clientscommunicate with third-party LBS providers through theanonymity server. The anonymity server is a securegateway to the semihonest LBS providers for the mobileclients. It runs a message perturbation engine, which performs

location perturbation on the messages received from the

mobile clients before forwarding them to the LBS provider.Each message sent to an LBS provider contains the locationinformation of the mobile client and a time stamp, inaddition to service-specific information. Upon receiving amessage from a mobile client, the anonymity serverremoves any identifiers such as Internet Protocol (IP)addresses, perturbs the location information through

spatio-temporal cloaking, and then forwards the anon-ymized message to the LBS provider. Spatial cloaking refersto replacing a 2D point location by a spatial range, wherethe original point location lies anywhere within the range.Temporal cloaking refers to replacing a time pointassociated with the location point with a time interval thatincludes the original time point. These terms wereintroduced by Gruteser and Grunwald [8]. In our work,the term location perturbation refers to the combination ofspatial cloaking and temporal cloaking.

1.3 k-Anonymity and Location k-Anonymity

There are two popular approaches to protect locationprivacy in the context of LBS usage: policy-based [9] and

anonymity-based approaches [8]. In policy-based ap-proaches, mobile clients specify their location privacypreferences as policies and completely trust that the third-party LBS providers adhere to these policies. In theanonymity-based approaches, the LBS providers areassumed to be semihonest instead of completely trusted.We advocate k-anonymity preserving management oflocation information by developing efficient and scalablesystem-level facilities for protecting the location privacythrough ensuring location k-anonymity. We assume thatanonymous location-based applications do not require useridentities for providing service. A discussion on pseudon-ymous and nonanonymous LBSs is provided in Section 7.

The concept ofk-anonymity was originally introduced in

the context of relational data privacy [10], [11]. It addressesthe question of how a data holder can release its privatedata with guarantees that the individual subjects of the datacannot be identified whereas the data remain practicallyuseful [12]. For instance, a medical institution may want torelease a table of medical records with the names of theindividuals replaced with dummy identifiers. However,some set of attributes can still lead to identity breaches.These attributes are referred to as the quasi-identifier. Forinstance, the combination of birth date, zip code, andgender attributes in the disclosed table can uniquelydetermine an individual. By joining such a medical recordtable with some publicly available information source like avoters list table, the medical information can be easilylinked to individuals. k-anonymity prevents such a privacybreach by ensuring that each individual record can only bereleased if there are at least k 1 distinct individuals whoseassociated records are indistinguishable from the former interms of their quasi-identifier values.

In the context of LBSs and mobile clients, locationk-anonymity refers to the k-anonymity usage of locationinformation. A subject is considered location k-anonymous ifand only if the location information sent from a mobileclient to an LBS is indistinguishable from the locationinformation of at least k 1 other mobile clients [8]. Thispaper argues that location perturbation is an effectivetechnique for supporting location k-anonymity and dealing

with location privacy breaches exemplified by the location

2 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 7, NO. 1, JANUARY 2008

Fig. 1. System architecture.


3/18

inference attack scenarios discussed in Section 1.1. If thelocation information sent by each mobile client is perturbedby replacing the position of the mobile client with a coarsergrained spatial range such that there are k 1 other mobileclients within that range k > 1, then the adversary willhave uncertainty in matching the mobile client to a knownlocation-identity association or an external observation of

the location-identity binding. This uncertainty increaseswith the increasing value ofk, providing a higher degree ofprivacy for mobile clients. Referring back to the roadsideinformation service example of Section 1.1, now, eventhough the LBS provider knows that person Awas reportedto visit location L, it cannot match a message to this userwith certainty. This is because there are at least k differentmobile clients within the same location L. As a result, itcannot perform further tracking without ambiguity.

1.4 Contributions and Scope of the Paper

This paper describes a personalized k-anonymity model forprotecting location privacy. Our design and developmentare motivated by an important observation: location privacy

demands personalization. In other words, location privacy isa personalized requirement and is context sensitive. Anindividual may have different privacy requirements indifferent contexts, and different individuals may requiredifferent levels of privacy in the same context. Unfortu-nately, existing anonymization solutions to location priv-acy threats in accessing LBSs, such as the earlier work onlocation k-anonymity [8], are essentially one size fits all:users do not have the ability to tailor the personalizationcapability to meet their individual privacy preferences.Most of the LBSs today promise ubiquitous access in anincreasingly connected world. However, they do not takeinto account the possibility that a users willingness toshare location data may depend on a range of factors such

as various contextual information about the user, forexample, environmental context, task context, social con-text, and so forth. One way to support personalizedlocation k-anonymity is to allow users to specify differentks at different times.

There is a close synergy between location privacy andlocation k-anonymity. A larger k in location anonymityimplies higher guarantees for location privacy. However,larger k values make it necessary to perform additionalspatial and temporal cloaking for successful messageanonymization, resulting in low spatial and temporalresolution for the anonymized messages. This, in turn, maylead to degradation in the quality of service (QoS) of LBSapplications, such as taking a longer time to serve the usersrequest or sending more than the required information backto the mobile client as a result of the inaccuracy associatedwith the users location after applying location cloaking.

These observations illustrate that there is a trade-offbetween the desired level of location privacy and theresulting loss of QoS from LBSs. The one-size-fits-allapproach to location privacy negatively impacts the QoSfor mobile clients with lower privacy requirements. Thiscalls for a framework that can handle users with differentlocation privacy requirements and allows users to specifytheir preferred balance between the degree of privacyand the QoS.

Our location privacy model exhibits two distinct fea-

tures. The first characteristic of our model is its ability to

enable each mobile client to specify the minimum level ofanonymity that it desires and the maximum temporal andspatial tolerances that it is willing to accept when requestingk-anonymity preserving LBSs. Concretely, instead of impos-ing a uniform k for all mobile clients, we provide efficientalgorithms and system-level facilities to support a person-alized k at a per-user level. Each user can specify a different

k-anonymity level based on her specific privacy requirementand can change this specification at a per-messagegranularity. Furthermore, each user can specify her pre-ferred spatial and temporal tolerances that should berespected by the location perturbation engine while main-taining the desired level of location k-anonymity. We callsuch tolerance specification and preference of the k valuethe anonymization constraint of the message. The preferenceof the k value defines the desired level of privacy protectionthat a mobile client wishes to have, whereas the temporaland spatial tolerance specifications define the acceptablelevel of loss in QoS from the LBS applications.

The second distinctive characteristic of our locationprivacy model is the development of an efficient message

perturbation engine, which is run by the trusted anonymiza-tion server, and performs location anonymization onmobile clients LBS request messages such as identityremoval and spatio-temporal cloaking of location informa-tion. We develop a suite of scalable and efficient spatio-temporal location cloaking algorithms, taking into accountthe trade-off between location privacy and QoS. Ourlocation perturbation engine can continuously process astream of messages for location k-anonymity and can workwith different cloaking algorithms to perturb the locationinformation contained in the messages sent from mobileclients before forwarding any request messages to the LBSprovider(s).

We conduct a series of experimental evaluations. Our

results show that the proposed personalized locationk-anonymity model, together with our message perturba-tion engine and location cloaking algorithms, can achievehigh guarantees ofk-anonymity and high resilience tolocation privacy threats without introducing significantperformance penalties. To the best of our knowledge, theprevious work on location privacy has not addressed theseissues. An earlier version of this paper appeared in aconference [13], which focuses on the basic concept ofpersonalized location k-anonymity and the design ideas ofour base algorithm for performing personalized locationcloaking. In contrast, this paper presents the completeframework for supporting personalized location privacy,which includes the formal definition of personalized

location k-anonymity, the theoretical framework for theproposed base algorithm with respect to its compliancewith personalized location k-anonymity, the optimizedalgorithms that enhance the base algorithm, and anextensive experimental study illustrating the effectivenessof the newly proposed algorithms.

2 PERSONALIZED LOCATION K-ANONYMITY:TERMINOLOGY AND DEFINITIONS

2.1 Message Anonymization Basics

In order to capture varying location privacy requirementsand ensure different levels of service quality, each mobile

client specifies its anonymity level (k value), spatial tolerance,

GEDIK AND LIU: PROTECTING LOCATION PRIVACY WITH PERSONALIZED K-ANONYMITY: ARCHITECTURE AND ALGORITHMS 3


4/18

and temporal tolerance. The main task of a location

anonymity server is to transform each message receivedfrom mobile clients into a new message that can be safely(k-anonymity) forwarded to the LBS provider. The key ideathat underlies the location k-anonymity model is twofold.First, a given degree of location anonymity can bemaintained, regardless of population density, by decreasingthe location accuracy through enlarging the exposed spatialarea such that there are other k 1 mobile clients present inthe same spatial area. This approach is called spatialcloaking. Second, one can achieve location anonymity bydelaying the message until k mobile clients have visited thesame area located by the message sender. This approach iscalled temporal cloaking. For reference convenience, weprovide Table 1 to summarize the important notations usedin this section and throughout the rest of the paper. The lastthree notations will be introduced in Section 3.

We denote the set of messages received from the mobileclients as S. We formally define the messages in the set Sasfollows:

ms 2 S: huid; rno|fflfflffl{zfflfflffl}sender id

message no

; ft;x;yg|fflfflfflffl{zfflfflfflffl}spatiotemporal

point

; k|{z}anonymity

level

; fdt; dx; dyg|fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl}temporal and spatial

tolerances

; Ci:

Messages are uniquely identifiable by the sendersidentifier, message reference number pairs uid; rno, with-in the set S. Messages from the same mobile client havethe same sender identifiers but different reference num-bers. In a received message, x, y, and t together form the3D spatio-temporal location point of the message, denoted asLms. The coordinate x; y refers to the spatial positionof the mobile client in the 2D space (x-axis and y-axis), andthe time stamp t refers to the time point at which themobile client was present at that position (temporaldimension: t-axis).

The k value of the message specifies the desiredminimum anonymity level. A value ofk 1 means thatanonymity is not required for the message. A value ofk > 1means that the perturbed message will be assigned a spatio-temporal cloaking box that is indistinguishable from at leastk 1 other perturbed messages, each from a differentmobile client. Thus, larger k values imply higher degrees of

privacy. One way to determine the appropriate k value is to

assess the certainty with which an adversary can link themessage with a location/identity association or binding.This certainty is given by 1=k.

The dt value of the message represents the temporaltolerance specified by the user. It means that the perturbedmessage should have a spatio-temporal cloaking box whoseprojection on the temporal dimension does not contain any

point more than dt distance away from t. Similarly, dxand dy specify the tolerances with respect to the spatialdimensions. The values of these three parameters aredependent on the requirements of the external LBS andusers preferences with regard to QoS. For instance, largerspatial tolerances may result in less accurate results tolocation-dependent service requests, and larger temporaltolerances may result in higher latencies of the messages.The dt value also defines a deadline for the message such thata message should be anonymized until time ms:t ms:dt. Ifnot, then the message cannot be anonymized according toits constraints, and it is dropped. Let v; d v d; v dbe a function that extends a numerical value v to a rangeby amount d. Then, we denote the spatio-temporal constraint

box of a message ms as Bcnms and define it asms:x;ms:dx;ms:y;ms:dy;ms:t;ms:dt. The field Cin ms denotes the message content.

We denote the set of perturbed (anonymized) messages asT. The messages in Tare defined as follows:

mt 2 T: huid; rno; fX: xs; xe; Y: ys; ye; I: ts; teg|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Bclmt; spatiotemporal cloaking box

; Ci:

For each message ms in S, there exists at most onecorresponding message mt in T. We call the message mtthe perturbed format of message ms, denoted as mt Rms.The function R defines a surjection (onto mapping) from Sto

T[ fnullg. Concretely, ifRms mt, then mt:uid ms:uidand mt:rno ms:rno. IfRms null, then the message ms isnot anonymized. The uid; rno fields of a message in Tarereplaced with a randomly generated identifier before themessage can be safely forwarded to the LBS provider.

In a perturbed message, X: xs; xe denotes the extentof the spatio-temporal cloaking box of the message on thex-axis, with xs and xe denoting the two end points of theinterval. The definitions ofY: ys; ye and I: ts; te aresimilar, with the y-axis and t-axis replacing the x-axis,respectively. We denote the spatio-temporal cloaking boxof a perturbed message as Bclmt and define it asmt:X: xs; xe; mt:Y: ys; ye; mt:I: ts; te. The field Cinmt denotes the message content.

2.1.1 Basic Service Requirements

The following basic properties must hold between a rawmessage ms in Sand its perturbed format mt in T, wherethe A uBnotation is used in the rest of the paper to expressthat the n-dimensional rectangular region A is contained inthe n-dimensional rectangular region B:

1. Spatial containment: ms:x 2 mt:X, ms:y 2 mt:Y.2. Spatial resolution: mt:Xums:x;ms:dx,

mt:Yums:y;ms:dy.3. Temporal containment: ms:t 2 mt:I.4. Temporal resolution: mt:Iums:t;ms:dt.5. Content preservation: ms:C mt:C.


TABLE 1Notation Reference Table


5/18

Spatial containment and temporal containment require-ments state that the cloaking box of the perturbedmessage Bclmt should contain the spatio-temporal pointLms of the original message ms. Spatial resolution andtemporal resolution requirements amount to saying that,for each of the three dimensions, the extent of the spatio-temporal cloaking box of the perturbed message on thatdimension should be contained within the intervaldefined by the desired maximum tolerance value specifiedin the original message. This is equivalent to stating thatthe cloaking box of the perturbed message should becontained within the constraint box of the originalmessage, i.e., Bclmt uBcnms. The content preservationproperty ensures that the message content remains as it is.

2.1.2 Anonymity Requirement

We formally capture the essence of location k-anonymity bythe following requirement, which states that, for a messagems in Sand its perturbed format mt in T, the followingcondition must hold:

. Location k-anonymity: 9T0 & T, such that mt 2 T0,

jT0j ! ms:k,

8fmti ;mtj g&T0 ; mti :uid6 mtj :uid

and 8mti 2T0 ; Bclmti Bclmt:

The k-anonymity requirement demands that, for eachperturbed message mt Rms, there exist at least ms:k 1other perturbed messages with the same spatio-temporalcloaking box, each from a different mobile client. A keychallenge for the cloaking algorithms employed by themessage perturbation engine is to find a set of messageswithin a minimal spatio-temporal cloaking box that satisfiesthe above conditions.

The following formal definition summarizes the person-alized location k-anonymity.

Definition 2.1 (Personalized location k-anonymity). Theset Tof anonymized messages respect the personalized locationk-anonymity if and only if the following conditions are metfor each message mt in Tand its corresponding message ms inS, where mt Rms:

1. spatial containment,2. spatial resolution,3. temporal containment,4. temporal resolution,5. content preservation, and6. location k-anonymity.

2.2 Location k-Anonymity: Privacy Value andPerformance Implications

The personalized location k-anonymity model presented inthis paper strives to strike a balance between providing aneffective location privacy and maintaining the desired QoSand performance characteristics. In the rest of this section,we first describe the privacy value of our approach andthen discuss the impact of location perturbation on theperformance and QoS of the location-based applications.

2.2.1 Privacy Value of Location k-Anonymity

We illustrate the privacy value of our location k-anonymity

model by comparing it with the strategy that only masks the

sources of the LBS requests. Assuming that the sources ofthe service request messages of mobile clients can bemasked from the LBS servers through the use of a trusted

anonymization server as a mix [14], [15], we want to showthat the location information retained in these messages canstill create a threat against location privacy, especially whencombined with information obtained from external obser-vation or knowledge.

As an example, consider the scenario in Fig. 2 wherethree mobile clients, labeled A, B, and C, and theirtrajectories (routes), denoted as Ra, Rb, and Rc, are depicted.If no location k-anonymization is performed at the anonym-ity server, then the following linking attack can beperformed by an adversary. First, the adversary canconstruct an approximation of the trajectories of the mobileclients by using the location information exposed in theservice request messages of the mobile clients. Second, it

can obtain a location/identity binding through externalobservation [8] such as A has been spotted at position x attime t. Finally, the adversary can link this binding with thetrajectories that it has at hand, finding the correct trajectoryof node A. In Fig. 2, if node A has been spotted at theposition marked x, then the trajectory Ra can be associatedwith A since no other trajectories cross the point x.However, such a linking attack is not effective if properlocation perturbation is performed by the trusted anonym-ity server. For instance, assume that the location informa-tion in the service request messages of nodes A, B, and Care perturbed according to location k-anonymity with k 3.The location k-anonymization replaces the point position

information x included in the request messages by alocation cloaking box, as shown in Fig. 2. Now, even ifthe adversary can infer approximate trajectories, it cannotlink the location/identity binding with one of the trajec-tories with certainty because the point x now links withthree different request messages, each with probabilityk1 1=3. In summary, the higher the k value, the better theprotection that one can achieve against linking attacks.

2.2.2 QoS and Performance Implications of Location

k-Anonymity

Achieving location k-anonymity with higher k, thus withhigher location privacy, typically requires assigning larger

cloaking boxes to perturbed messages. However, arbitrarily


Fig. 2. Linking attack.


6/18

large cloaking boxes can potentially result in a decreasedlevel of QoS or performance with respect to the targetlocation-based application. Here, we give an examplescenario to illustrate the negative impacts of using largecloaking boxes.

Assume that our LBS application provides a resourcelocation service such as providing answers to continuousqueries asking for the nearest resources with certain typesand properties during a specified time interval; forexample, the nearest gas station offering gas under 2=galduring the next half-hour. Fig. 3 shows four objects o1, o2,o3, and o4 that are valid gas stations for the result of such a

query posed by a mobile client A, where the aim of the LBSis to provide A with the nearest gas station, that is, o1. Thefigure also shows a sample cloaking box assigned to one ofthe service request messages of node A by the anonymityserver. Since the LBS is unable to tell the exact location ofnode A within the cloaking box, and since the cloaking boxcovers all of the four objects, the LBS will be unable todetermine the exact result; that is, it cannot decide which ofthe four objects is the closest one. There are differentpossible ways of handling this problem, each with adifferent shortcoming. First, the server may choose toreport only one result by making a random decision. In thiscase, the QoS of the application, as perceived by the user,will degrade. The larger the cloaking box is, the higher theexpected distance between the actual result and thereported result will be. Second, the server may choose toreport all valid resources within the cloaking box. In thiscase, several side effects arise. For instance, one side effectis that, the larger the cloaking box is, the larger the size ofthe result set will be and, thus, the higher the communica-tion and processing costs are, which degrades the perfor-mance. Moreover, the result set requires further filterprocessing at the mobile client side to satisfy the applicationsemantics. This means that each LBS may need to shipsome amount of postprocessing to the mobile client side inorder to complete query processing, and this results inrunning untrusted code at the mobile client side. On the

other hand, the LBS may choose to skip the postprocessingstep, which leaves the filtering to the user of the system andthus decreases the QoS. An alternative for the LBS is to useprobabilistic query processing techniques [16] to rank theresults and provide hints to the filtering process to alleviatethe degradation in QoS.

In summary, we argue that there is a need for adjustingthe balance between the level of protection provided bylocation k-anonymity and the level of performance degrada-tion in terms of the QoS of LBSs due to location cloaking.Such a balance should be application driven. Our person-alized location-anonymity model makes it possible to adjustthis balance at a per-user level, with the granularity of

individual messages.

3 MESSAGE PERTURBATION: DESIGN ANDALGORITHMS

In this section, we first give an overview of the messageperturbation engine. Then, we establish a theoretical basisfor performing spatio-temporal cloaking. This is followedby an in-depth description of the perturbation engine and

the algorithms involved.3.1 Engine Overview

The message perturbation engine processes each incomingmessage ms from mobile clients in four steps. The first step,called zoom in, involves locating a subset of all messagescurrently pending in the engine. This subset containsmessages that are potentially useful for anonymizing thenewly received message ms. The second step, calleddetection, is responsible for finding the particular group ofmessages within the set of messages located in the zoom-instep such that this group of messages can be anonymizedtogether with the newly received message ms. If such agroup of messages is found, then the perturbation is

performed over these messages in the third step, calledperturbation, and the perturbed messages are forwarded tothe LBS provider. The last step, called expiration, checks forpending messages whose deadlines have passed and thusshould be dropped. The deadline of a message is the highpoint along the temporal dimension of its spatio-temporalconstraint box, and it is bounded by the user-specifiedtemporal tolerance level.

In order to guide the process of finding the particular setof messages that should be anonymized as a group, wedevelop the CliqueCloak theorem (see Section 3.2). We referto the cloaking algorithms that make their decisions basedon this theorem as the CliqueCloak algorithms. The

perturbation engine, which is driven by the local-k searchas the main component of the detection step, is our baseCliqueCloak algorithm. Other CliqueCloak algorithms arediscussed in Section 4.

3.2 Grouping Messages for Anonymization

A key objective for location anonymization is to developefficient location cloaking algorithms for providing person-alized privacy protection while maintaining the desiredQoS. A main technical challenge for developing an efficientcloaking algorithm is to find the smallest spatio-temporalcloaking box for each message ms 2 Swithin its specifiedspatial and temporal tolerances such that there exist at least

ms:k 1 other messages, each from a different mobileclient, with the same minimal cloaking box. Let us considerthis problem in two steps in reverse order: 1) given a set Mof messages that can be anonymized together, how can theminimal cloaking box in which all messages in Mreside befound, and 2) for a message ms 2 S, how can the set Mcontaining ms and the group of messages that can beanonymized together with ms be found? A set M& Sofmessages are said to be anonymized together if they areassigned the same cloaking box and all the requirementsdefined in Section 2.1 are satisfied for all messages in M.

Consider a set M& Sof messages that can be anony-mized together. The best strategy to find a minimal cloaking

box for all messages in Mis to use the minimum bounding


Fig. 3. Loss of QoS/performance.


7/18

rectangle (MBR1) of the spatio-temporal points of themessages in Mas the minimal cloaking box. This definitionof minimal cloaking box also ensures that the cloaking boxis contained in the constraint boxes of all other messages inM. We denote the minimum spatio-temporal cloaking box of aset M& Sof messages that can be anonymized together asBmM, and we define it to be equal to the MBR of the

points in the set fLmsjms 2 Mg, where Lms denotes thespatio-temporal location point of the message ms.

Now, let us consider the second step: Given a messagems 2 S, how can the set Mcontaining ms and the group ofmessages that can be anonymized together with ms befound? Based on the above analysis and observations, oneway to tackle this problem is to model the anonymizationconstraints of all messages in Sas a constraint graphdefined below and translate the problem into the problemof finding cliques that satisfy certain conditions in theconstraint graph.

Definition 3.1 (Constraint graph). Let GS; E be anundirected graph where Sis the set of vertices, each representing

a message received at the trusted location perturbation engine,and Eis the set of edges. There exists an edge e msi ; msj 2Ebetween two vertices msi and msj if and only if the followingconditions hold: 1) Lmsi 2 Bcnmsj, 2) Lmsj 2 Bcnmsi,and 3) msi :uid6 msj :uid. We call this graph the constraintgraph.

Conditions 1, 2, and 3 together state that two messagesare connected in the constraint graph if and only if theyoriginate from different mobile clients and their spatio-temporal points are contained in each others constraintboxes defined by their tolerance values.

Theorem 3.1 (CliqueCloak theorem). Let GS; E be a

constraint graph M fms1 ; ms2 ; . . . ; msl g & Sand 81il,mti hmsi :uid; msi :rno; BmM; msi :Ci. Then, 81il; mti is avalid k-anonymous perturbation ofmsi , i.e., mti Rmsi, ifand only if the set Mof messages form an l-clique in theconstraint graph GS; E such that 81il; msi :k l.

Proof. First, we show that the left-hand side holds if weassume that the right-hand side holds. The spatial andtemporal containment requirements are satisfied, as wehave 81il, Lmsi 2 BmM Bclmti from the defini-tion of an MBR. k-anonymity is also satisfied, as for anymessage msi 2 M, there exists l ! msi :k messagesfmt1 ; mt2 ; . . . ; mtl g & Tsuch that

81jl; BmM Bclmtj Bclmtiand 81i6jl, mti :uid6 mtj :uid. The latter follows, as Mforms an l-clique and, due to condition 3, two messagesmsi and msj do not have an edge between themin GS; E ifmsi :uid msj :uidand we have 81il,msi :uid mti :uid.

It remains to be proved that the spatial and temporalresolution constraints are satisfied. To see this, considerone of any three dimensions in our spatio-temporal spacewithout loss of generality, say, the x-dimension. Letxmin min1il msi :x a nd l et xmax max1il msi :x.

Since Mforms an l-clique in GS; E, from conditions 1and 2, we have 81il, fxmin; xmaxg umsi :x;msi :dxand, thus, 81il, xmin; xmax umsi :x;msi :dx fromconvexity. Using a similar argument for other dimensionsand noting that BmM xmin; xmax, ymin; ymax, andtmin; tmax, we have 81il, BmM uBcnmsi. Now, weshow that the right-hand side holds if we assume that the

left-hand side holds. Since 81il, mti Rmsi, from thedefinition ofk-anonymity, we must have 81il, msi :k l.From the spatial and temporal containment require-ments, we have 81il, Lmsi 2 BmM, and from thespatial and temporal resolution constraints, we have81il, BmM uBcnmsi. These two imply that 81i;jl,Lmsi 2 Bcnmsj, satisfying conditions 1 and 2. Again,from k-anonymity, we have 81i6jl, msi :uid6 msj :uid,satisfying condition 3. Thus, Sforms an l-clique inGS; E, completing the proof. tu

3.3 Illustration of the Theorem

We demonstrate the application of this theorem with anexample. Fig. 4 shows four messages: m1, m2, m3, and m4.Each message is from a different mobile client.2 Weomitted the time domain in this example for ease ofexplanation, but the extension to the spatio-temporal spaceis straightforward. Initially, the first three of thesemessages are inside the system. Spatial layout Ishowshow these three messages spatially relate to each other. Italso depicts the spatial constraint boxes of the messages.Constraint graph Ishows how these messages are connectedto each other in the constraint graph. Since the spatiallocations of messages m1 and m2 are mutually contained ineach others spatial constraint box, they are connected inthe constraint graph and m3 lies apart by itself. Althoughm1 and m2 form a 2-clique, they cannot be anonymized

and are removed from the graph. This is because m2:k 3

and, as a result, the clique does not satisfy the CliqueCloaktheorem. Spatial layout IIshows the situation after m4arrives, and constraint graph IIshows the correspondingstatus of the constraint graph. With the inclusion ofm4,there exists only one clique whose size is at least equal tothe maximum k value of the messages that it contains. Thisclique is fm1; m2; m4g. We can compute the MBR of themessages within the clique, use it as the spatio-temporalcloaking box of the perturbed messages, and then safelyremove this clique. Fig. 4b clearly shows that the MBR iscontained by the spatial constraint boxes of all messageswithin the clique. Once these messages are anonymized,the remaining message m3 is not necessarily dropped from

the system. It may later be picked up by some other newmessage and anonymized together with it, or it may bedropped if it cannot be anonymized until its deadline,specified by its temporal tolerance constraint.

Although, in the described example, we have found asingle clique immediately after m4 was received, we couldhave had cliques of different sizes to choose from. Forinstance, ifm4:k was 2, then fm3; m4g would have alsoformed a valid clique according to the CliqueCloaktheorem. We address the questions ofwhat kind of cliquesto search and when and how we can search for such cliquesin detail in Section 4.


1. The MBR of a set of points is the smallest rectangular region that

would enclose all the points.

2. If a node sends a message while it already has one in the system, the

new message can be easily detected.


8/18

3.4 Data Structures

We briefly describe the four main data structures that areused in the message perturbation engine.

The Message Queue Qm is a simple first-in, first-out (FIFO)queue which collects the messages sent from the mobileclients in the order that they are received. The messages arepopped from this queue by the message perturbationengine in order to be processed.

The Multidimensional Index Im is used to allow efficientsearch on the spatio-temporal points of the messages. Foreach message, say, m

s, in the set of messages that are not yet

anonymized and are not yet dropped according to theexpiration condition specified by the temporal tolerance, Imcontains a 3D point Lms as a key, together with themessage ms as data. The index is implemented using an in-memory R-tree [17] in our system.

The Constraint Graph Gm is a dynamic in-memory graph,which contains the messages that are not yet anonymizedand not yet dropped due to expiration. The structure of theconstraint graph is already defined in Section 3.2. Themultidimensional index Im is mainly used to speed up themaintenance of the constraint graph Gm, which is updatedwhen new messages arrive or when messages are anon-ymized or expired.

The Expiration Heap Hm is a mean heap sorted based onthe deadline of the messages. For each message, say, ms, inthe set of messages that are not yet anonymized and are notyet dropped due to expiration, Hm contains a deadlinems:t ms:dt as the key, together with the message ms as thedata.3 The expiration heap is used to detect expiredmessages that cannot be successfully anonymized so thatthey can be dropped and removed from the system.

3.5 Engine Algorithms

Upon the arrival of a new message, the perturbationengine will update the message queue (FIFO) to include

this message. The message perturbation process works by

continuously popping messages from the message queue

and processing them for k-anonymity in four steps. The

pseudocode of the perturbation engine is given in

Algorithm 1.

Algorithm 1. Message Perturbation Engine.

MSGPERTENGINE()The engine runs in its own thread, as long as the variable

engine runningremains set to true. This variable is

initialized to true when the engine is turned on and is setto false when the engine is closed down upon an explicitcommand.(1)while engine running true(2) if Qm 6 ;(3) msc Pop the first item in Qm(4) Add msc into Im with Lmsc (5) Add msc into Hm with msc :t msc :dt(6) Add the message msc into Gm as a node

(7) N Range search Im using Bcnmsc

(8) foreach ms 2 N, ms 6 msc(9) if Lmsc 2 Bcnms(10) Add the edge msc ; ms into Gm(11) G0m Subgraph ofGm consisting of messages in N(12) M LOCAL-k_SEARCHmsc :k;msc ; G

0m

(13) if M6 ;(14) Randomize the order of messages in M(15) foreach ms in M(16) Output perturbed message

mt hms:uid; ms:rno; BmM; ms:Ci(17) Remove the message ms from Gm, Im, Hm(18) while true

(19) ms Topmost item in Hm(20) if ms:t ms:dt < now(21) Remove the message ms from Gm, Im(22) Pop the topmost element in Hm(23) else break

Phase 1: Zoom-in. In this step, we update the datastructures with the new message from the message queueand integrate the new message into the constraint graph,i.e., search all the messages pending for perturbation andlocate the messages that should be assigned as neighbors toit in the constraint graph (zoom in). Concretely, when amessage msc is popped from the message queue, it isinserted into the index Im by using Lmsc, inserted into the

heap Hm by using msc :t msc :dt, and inserted into thegraph Gm as a node. Then, the edges incident uponvertex msc are constructed in the constraint graph Gm bysearching the multidimensional index Im by using thespatio-temporal constraint box of the message, i.e.,Bcnmsc, as the range. The messages whose spatio-temporalpoints are contained in Bcnmsc are candidates for beingmsc s neighbors in the constraint graph. These messages(denoted as Nin the pseudocode) are filtered based onwhether their spatio-temporal constraint boxes containLmsc . The ones that pass the filtering step becomeneighbors ofmsc . We call the subgraph that contains mscand its neighbors the focused subgraph, denoted by G0m (see

lines 3-11 in the pseudocode).


3. It is memorywise more efficient to store only identifiers of messages as

data inI

m,G

m, andH

m.

Fig. 4. Illustration of the CliqueCloak theorem. (a) Spatial layout I.(b) Spatial layout II. (c) Constraint graph I. (d) Constraint graph II.


9/18

Phase 2: Detection. In this step, we apply the local-k searchalgorithm in order to find a suitable clique in the focusedsubgraph G0m ofGm, which contains msc and its neighborsin Gm, denoted by nbrmsc ; Gm. Formally, nbrmsc ; Gm isdefined as fmsjmsc ; ms is an edge in Gmg. In the local-ksearch, we try to find a clique of size msc :k that includes themessage msc and satisfies the CliqueCloak theorem. The

pseudocode of this step is given separately in Algorithm 2as the function local-k Search. Note that the local-k Searchfunction is called within Algorithm 1 (line 12), withparameter k set to msc :k. Before beginning the search, aset U& nbrmsc ; G

0m is constructed such that, for each

message ms 2 U, we have ms:k k (lines 1 and 2). Thismeans that the neighbors ofmsc whose anonymity valuesare higher than k are simply discarded from Uas theycannot be anonymized with a clique of size k. Then, the setUis iteratively filtered until there is no change (lines 3-8). Ateach filtering step, every message ms 2 Uis checked to seewhether it has at least k 2 neighbors in U. If not, then themessage cannot be part of a clique that contains msc and has

size k; thus, the message is removed from U. After the set Uis filtered, the possible cliques in U[ fmsc g that contain mscand have size k are enumerated, and if one satisfying thek-anonymity requirements is found, then the messages inthat clique are returned.

Algorithm 2. Local-k search algorithm.LOCAL-k_SEARCHk; msc ; G

0m

(1) U fmsjms 2 nbrmsc ; G0m and ms:k kg

(2) if jUj < k 1 then return ;(3) l 0(4) while l 6 jUj(5) l jUj(6) foreach ms 2 U(7) if jnbrms; G0m \ Uj < k 2(8) U Un fmsg(9) Find any subset M& U, s.t.

jMj k 1 and M[ fmsc g forms a clique(10) if Mfound then return M(11) else return ;

Phase 3: Perturbation. In this step, we generate thek-anonymized messages to be forwarded to the externalLBS providers. If a suitable clique is found in the detectionstep, then the messages in the clique (denoted as Min thepseudocode) are first randomized to prevent temporalcorrelations with message arrival times. Then, they areanonymized by assigning BmM (the MBR of the spatio-temporal points of the messages in the clique) as theircloaking box. Then, they are removed from the graph Gm,as well as from the index Im and the heap Hm. This step isdetailed in the pseudocode through lines 13-17. In case aclique cannot be found, the message stays inside Im, Gm,and Hm. It may be later picked up and anonymized duringthe processing of a new message or may be dropped whenit expires.

Phase 4: Expiration. A message is considered to be expiredif the current time is beyond the high point along thetemporal dimension of the messages spatio-temporalconstraint box, which means that the message has delayedbeyond its deadline. In this step, we take care of the expired

messages. After the processing of each message, we check

the expiration heap for any messages that have expired. Themessage on top of the expiration heap is checked, and if itsdeadline has passed, then it is removed from Im, Gm, andHm. Such a message cannot be anonymized and is dropped.This step is repeated until a message whose deadline isahead of the current time is reached. Lines 18-23 of thepseudocode deals with expiration.

4 IMPROVED CliqueCloak ALGORITHMS

In this section, we describe several CliqueCloak algorithmsthat improve the performance of the base algorithmdescribed in Section 3.5. These variations are introducedthrough configurations along the three dimensions shownin Fig. 5. These three dimensions represent three criticalaspects of the clique search performed for locating a groupof messages that can be anonymized together: 1) what sizesof message groups are searched, 2) when the search isperformed, and 3) how the search is performed.

In the rest of this section, we discuss various optimiza-tions that we propose along these three dimensions toimprove the basic algorithm. All of the proposed optimiza-tions are heuristic in nature. We would like to note that thegeneral problem of the optimal k-anonymization is shown to

be NP-hard [18], [19].

4.1 What Size Cliques to Search: Nbr-kversusLocal-k

When searching for a clique in the focused subgraph, it isessential to ensure that the newly received message, say,msc , should be included in the clique. If there is a new cliqueformed due to the entrance ofmsc in the graph, then it mustcontain msc . Thus, msc is a good starting position. Inaddition, we want to look for bigger cliques that include mscinstead of searching for a clique with size msc :k, providedthat the k value of each message within the clique is smallerthan or equal to the size of the clique. There are two strongmotivations behind this approach. First, by anonymizing a

larger number of messages at once, we can minimize thenumber of messages that have to wait for later arrivingmessages in order to be anonymized. Second, by anonymiz-ing messages in larger groups, we can provide betterprivacy protection against linking attacks. We develop thenbr-k search algorithm based on these design guidelines. Itspseudocode is given in Algorithm 3.

Algorithm 3: nbr-k search algorithm.NBR-k_SEARCHmsc ; G

0m

(1) if jnbrmsc ; G0mj < msc :k 1 then return ;

(2) V fms:kjms msc _ ms 2 nbrmsc ; G0mg

(3) foreach distinct k 2 Vin decreasing order

(4) if k < msc :k then return ;


Fig. 5. Algorithmic dimensions.


10/18

(5) M LOCAL-k_SEARCHk; msc ; G0m

(6) if M6 ; then return M(7) return ;

The nbr-k search first collects the set ofk values that thenew message msc and its neighbors nbrmsc ; G

0m have,

denoted as Vin the pseudocode. The k values in Vareconsidered in decreasing order until a clique is found or kbecomes smaller than msc :k, in which case the searchreturns empty set. For each k 2 Vconsidered, a clique ofsize k is searched by calling the local-k Search function withappropriate parameters (see line 5). If such a clique can befound, the messages within the clique are returned. Tointegrate the nbr-k search into the message perturbationengine, we can replace line 12 of Algorithm 1 with the callto the nbr-k Search function.

4.2 When to Search for Cliques: Deferred versusImmediate

We have described the algorithm of searching for cliquesupon the arrival of a new message and refer to this type of

search as immediate search. The immediate search is notbeneficial when the constraint graph is sparse around thenew message, and the anonymization is less likely to besuccessful. Thus, it may result in an increased number ofunsuccessful searches and deteriorate the performance.

Instead of immediately searching for a clique for eachnewly arrived message, we can defer this processing. Oneextreme approach to the deferred search is to postpone theclique search for every message and perform the cliquesearch for a deferred message at the time of its expiration if ithas not yet anonymized together with other messages.However, this will definitely increase the delay for allmessages. An alternative way to the deferred search is topostpone the search only if the new message does not have

enough neighbors around and, thus, the constraint grapharound this new message is sparse. Concretely, we introducea system parameter ! 1 to adjust the number of messagesfor which the clique search is deferred and perform theclique search for a new message msc only if the number ofneighbors that this new message has at its arrival time islarger than or equal to msc :k. Smaller values push thealgorithm toward immediate processing. We can set the value statically at compile time based on experimentalstudies or adaptively during runtime by observing the rateof successful clique searches with different values. Werefer to this variation of the clique search algorithm asdeferred CliqueCloak. The main idea of the deferred search isto wait until a certain message density is reached and thenperform the clique search. Thus, the deferred approachperforms a smaller number of clique searches at the cost oflarger storage and data structure maintenance overhead.

4.3 How to Search Cliques: Progressive versusOne Time

Another important aspect that we can use to optimize theclique search performance is how we can search perspec-tive. It is interesting to note that, when the constraint boxesof messages are large, messages are more likely to beanonymized since the constraints are relaxed. However,when the constraint boxes of messages are large, the cliquesearches do not terminate early and incur a high-perfor-

mance penalty. We observe that this is due to the increased

search space of the clique search phase, which is a directconsequence of the fact that large constraint boxes result ina large number of neighbors around the messages in theconstraint graph. This inefficiency becomes more prominentwith increasing k due to the combinatorial nature of thesearch. An obvious way to improve the search is to firstconsider neighbors that are spatially close by, which allows

us to terminate our search quickly and avoid or reduce theprocessing time spent on the neighbors that are spatially faraway and potentially less useful for anonymization.

Algorithm 4. Progressive search algorithm.

PROGRESSIVESEARCHmsc ; G0m

(1) U Sort nbrmsc ; G0m based on the euclidean distance

between Lms and Lmsc , ms 2 nbrmsc ; G0m

(2) z 1(3) repeat

(4) z z 1(5) v MINz msc :k; jUj(6) G00m Subgraph ofG

0m containing msc

and the first v 1 messages in U(7) M LOCAL-k_SEARCH k; msc ; G00m

// or NBR-k_SEARCH msc ; G00m

(8) if M6 ; then return M(9) until v < jUj(10) return ;

The progressive search technique builds upon this insight.It first sorts (in increasing order) the neighbors of a messagemsc based on the distance of their spatio-temporal point tomsc s spatio-temporal point. Then, the search (either local-kor nbr-k) is performed iteratively over the set of sortedmessages by using a progressively enlarging subgraph thatconsists of a progressively increasing number of messagesfrom the sorted set of messages. Initially, the subgraphconsists of only z msc :k messages, where z 2. If a properclique cannot be found, then we increase zby 1 and applythe search over the enlarged subgraph again. This processrepeats until the complete set of sorted messages isexhausted. Algorithm 4 gives a sketch of the progressivesearch. For the purpose of comparison, we refer to thenonprogressive search as one-step search.

5 EVALUATION METRICS

In this section, we discuss several evaluation metrics forsystem level control of the balance between the privacyvalue and performance implication in terms of QoS. These

metrics can be used to evaluate the effectiveness and theefficiency of the message perturbation engine.Success rate is an important measure for evaluating the

effectiveness of the proposed location k-anonymity model.Concretely, the primary goal of the cloaking algorithm is tomaximize the number of messages perturbed successfullyin accordance with their anonymization constraints. Inother words, we want to maximize jTj. The success rate canbe defined over a set S0 & Sof messages as the percentageof messages that are successfully anonymized (perturbed),

that is, jfmtjmtRms;mt2T;m s2S0gj

1001jS0j .Important measures of efficiency include relative ano-

nymity level, relative temporal resolution, relative spatial

resolution, and message processing time. The first three are



11/18

measures related with QoS, whereas the last one is aperformance measure. It is important to remember that theprimary evaluation metric for our algorithms is the successrate. This is because our approach always guarantees thatthe anonymization constraints are respected for the mes-sages that are successfully anonymized. When comparingtwo approaches that have similar success rates, one that

provides a better relative anonymity level or relativespatial/temporal resolution is preferred.

The relative anonymity level is a measure of the level ofanonymity provided by the cloaking algorithm, normalizedby the level of anonymity required by the messages. Wedefine the relative anonymity level over a set T0 & Tofperturbed messages by

1

jT0j

XmtRms2T0

jfmjm 2 T Bclmt Bclmgj

ms:k:

Note that the relative anonymity level cannot go below 1.Higher relative anonymity levels mean that, on the average,messages are getting anonymized with larger k values than

the user-specified minimum k-anonymity levels. Due to theinherent trade-off between the anonymity level and thespatial and temporal resolutions, a user may have to specifya lower k value than what she actually desires in order tomaintain a certain amount of spatial resolution and/ortemporal resolution for the service request messages. Inthese cases, we will prefer algorithms that can providehigher relative anonymity levels.

The relative spatial resolution is a measure of the spatialresolution provided by the cloaking algorithm, normalizedby the minimum acceptable spatial resolution defined bythe spatial tolerances. We define the relative spatialresolution over a set of perturbed messages T0 & Tby

1jT0j

PmtRms2T0

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2ms:dx2ms :dykmt:Xkkmt:Yk

q;

where k:k is applied to an interval and gives its length. Thenumerator in the equation is the area of the constraint boxfor the source message, whereas the denominator is the areaof the cloaking box for its transformed format. Higherrelative spatial resolution values imply that anonymizationis performed with smaller spatial cloaking regions relativeto the constraint boxes specified.

The relative temporal resolution is a measure of thetemporal resolution provided by the cloaking algorithm,normalized by the minimum acceptable temporal resolution

defined by the temporal tolerances. We define the relativetemporal resolution over a set of perturbed messages T0 & Tby 1jT0j

PmtRms2T0

2ms :dtkmt:Ik

. Higher relative temporal resolu-tion values imply that anonymization is performed withsmaller temporal cloaking intervals and, thus, with smallerdelays due to perturbation. Relative spatial and temporalresolutions cannot go below 1.

The message processing time is a measure of theruntime performance of the message perturbation engine.The message processing time may become a critical issue ifthe computational power at hand is not enough to handlethe incoming messages at a high rate. In the experiments inSection 6, we use the average CPU time needed to process

10

3

messages as the message processing time.

6 EXPERIMENTAL STUDYWe break up the experimental evaluation into threecomponents. The first two components demonstrate theeffectiveness of the CliqueCloak algorithms in realizing theproposed personalized location k-anonymity model in termsof the success rate, relative anonymity level, and relativespatial/temporal resolution. The third component studiesthe scalability of the algorithms under extreme cases interms of the runtime performance. Before presenting ourexperimental results, we first describe the trace generatorused to generate realistic traces that are employed in theexperiments and the details of our experimental setup.

6.1 Experimental Setup

We have developed a trace generator (shown in Fig. 6),which simulates cars moving on roads and generatesrequests using the position information from the simula-tion. The trace generator loads real-world road data,available from the National Mapping Division of the USGeological Survey (USGS) [20] in Spatial Data TransferStandard (SDTS) [21] format. We use a transportationlayer of 1:24,000 Digital Line Graphs (DLGs) as road data.We convert the graphs into Scalable Vector Graphic [22]format using the Global Mapper [23] software and usethem as input to our trace generator. We extract threetypes of roads from the trace graph: class 1 (expressway),class 2 (arterial), and class 3 (collector). The generator uses

real traffic volume data to calculate the total number ofcars for different road classes. The total number of cars ona certain class of roads is proportional to the total lengthof the roads for that class and the traffic volume for thatclass and is inversely proportional to the average speed ofcars for that class. Once the number of cars on each typeof road is determined, they are randomly placed into thegraph, and the simulation begins. Cars move on the roadsand take other roads when they reach joints. The simulatortries to keep the fraction of cars on each type of roadconstant as time progresses. A car changes its speed ateach joint based on a normal distribution whose mean isequal to the average speed for the particular class of roads

that the car is on.


Fig. 6. The trace generator.


12/18

We used a map from the Chamblee region of the state ofGeorgia to generate the trace used in this paper. Fig. 6shows this map loaded into the trace generator. The mapcovers a region of% 160 km2. In terms of the length ofroads, class-1 roads constitute 7.3 percent of the total,whereas class-2 and class-3 roads constitute 5.4 percentand 87.3 percent, respectively. The mean speeds andstandard deviations for each road type are given in Table 3.The traffic volume data is taken from Gruteser andGrunwald [8] and is also listed in Table 2. These settingsresult in approximately 10,000 cars, 32 percent of which areon class-1 roads, 13 percent are on class-2 roads, and

55 percent are on class-3 roads. The trace has the durationof 1 hour. Each car generates several messages during thesimulation. All evaluation metrics are calculated over thesemessages generated within the 1-hour period4 (over1,000,000 messages). The experimental results are averagesof a large number of messages. Each message specifies ananonymity level in the form of a k value, which is pickedfrom the list {5, 4, 3, 2} by using a Zipf distribution withparameter 0.6. The setting ofk 5 is the most popular one,and k 2 is the least popular one based on the Zipfdistribution. In certain experiments, we extend this list upto k 12, keeping the highest k value as the most popularanonymity level. This enables us to model a populationthat prefers higher privacy in general. We show that, even

for such a workload, the personalized k-anonymity modelprovides significant gains.

The spatial and temporal tolerance values of the messagesare selected using normal distributions whose defaultparameters are given in Table 2. Whenever a message isgenerated, the originator of the message waits until themessage is anonymized or dropped, after which it waits for anormally distributed amount of time, called the interwaittime, whose default parameters are listed in Table 2.

All parameters take their default values if not statedotherwise. We change many of these parameters to observe

the behavior of the algorithms in different settings. For thespatial points of the messages, the default settings result inanonymizing around 70 percent of messages with anaccuracy of< 18 m in 75 percent of the cases, which weconsider to be very good when compared to the E-911requirement of125 m accuracy in 67 percent of the cases [6].For the temporal point of the messages, the defaultparameters also result in a delay of< 10 s in 75 percent ofthe cases and < 5 s in 50 percent of the cases. Our resultsshow that the personalized location k-anonymity approach

presented in this paper is a promising solution. Althoughthere are many aspects of the experimental design, such ascar movement patterns, privacy and QoS requirements ofusers, message generation rates, and so forth, which canaffect the results of our experiments, we believe that ourresults provide a necessary and informative first step tounderstand the fundamental characteristics of this person-alized location privacy model. We hope that the resultspresented in this paper will stimulate more research tocomprehensively evaluate the applicability of personalizedlocation privacy in real-world LBS applications.

6.2 Effectiveness of the Personalized k-AnonymityModel

We first study the effectiveness of our personalizedlocation k anonymity model with respect to 1) differentk requirements from individual users and 2) the uniformk-anonymity model.

Table 4 shows the advantage of using variable kcompared to using a uniform k value 5 independentof the individual k values specified in the messages. Weobserve that the variable k approach provides 33 percenthigher success rate, 110 percent better relative spatialresolution, and 30 percent better relative temporal resolu-tion for messages with k 2. The improvements are higherfor messages with smaller k values, which implies that thevariable k location anonymity approach does not unneces-sarily penalize users with low privacy requirements whencompared to the uniform k approach. The amount ofimprovement in terms of the evaluation metrics decreasesas k approaches its maximum value of 5.

6.3 Results on Success Rate

Fig. 7 shows the success rate for the nbr-k and local-kapproaches. The success rate is shown (on the y-axis) fordifferent groups of messages, each group representingmessages with a certain k value (on the x-axis). The twoleftmost bars show the success rate for all of the messages.The wider bars show the actual success rate provided by theCliqueCloak algorithm. The thinner bars represent a lowerbound on the percentage of messages that cannot be

anonymized no matter what algorithm is used. This lower


4. From running 15 minute traces 12 times, we have observedinsignificant standard deviation values (% 0.4 for a mean success rate of

% 70) and, thus, decided to report the results from a single 1 hour trace.

TABLE 2Message Generation Parameters

TABLE 3Car Movement Parameters

TABLE 4Success Rate and Relative Spatial/Temporal

Resolutions with Fixed k Compared to Variable k


13/18

bound is calculated as follows: For a message ms, if the setUfmsi jmsi 2 S Lmsi 2 Bcnms ^ msi :t 2 ms:t;ms:dtghas size less than ms:k, then the message cannot beanonymized. This is because the total number of messagesthat ever appear inside mss constraint box during itslifetime is less than ms:k. However, if the set Uhas size of

at least ms:k, then the message ms may still not beanonymized under a hypothetical optimal algorithm. Thisis because the optimal choice may require anonymizing asubset ofUthat does not include ms, together with someother messages not in U. As a result, the remainingmessages in Umay not be sufficient to anonymize ms. Itis not possible to design an online algorithm that is optimalin terms of success rate due to the fact that such analgorithm will require future knowledge of messages,which is not known beforehand. If a trace of the messagesis available, as in this work, then the optimal success ratecan be computed offline. However, we are not aware of atime-and-space-efficient offline algorithm for computingthe optimal success rate. As a result, we use a lower boundon the number of messages that cannot be anonymized.

There are three observations from Fig. 7. First, the nbr-kapproach provides an average success rate of around15 percent better than local-k. Second, the best averagesuccess rate achieved is around 70 percent. Out of the30 percent of dropped messages, at least 65 percent of themcannot be anonymized, meaning that, in the worst case, theremaining 10 percent of all messages are dropped due tononoptimality of the algorithm with respect to success rate.If we knew of a way to construct the optimal algorithmwith a reasonable time and space complexity, given fullknowledge of the trace, we could have gotten a betterbound. Last, messages with larger k values are harder toanonymize. The success rate for messages with k 2 isaround 30 percent higher than the success rate formessages with k 5.

Fig. 8 shows the relative anonymity level (the higher, thebetter) for nbr-k and local-k. The relative anonymity level isshown (on the y-axis) for different groups of messages, eachgroup representing messages with a certain k value (on thex-axis). Nbr-k shows a relative anonymity level of 1.7 formessages with k 2, meaning that, on the average, thesemessages are anonymized with k 3:4 by the algorithm.Local-k shows a lower relative anonymity level of 1.4 formessages with k 2. This gap between the two approachesvanishes for messages with k 5, since both algorithms donot attempt to search cliques of sizes larger than the

maximum k value in the system. The difference in the

relative anonymity level between nbr-k and local-k showsthat the nbr-k approach is able to anonymize messageswith smaller k values together with the ones with higherk values. This is particularly beneficial for messages withhigher k values as they are harder to anonymize. This also

explains why nbr-k results in a better success rate.We also studied the average success rate and themessage processing time for the nbr-k and local-k searchapproaches with the immediate or deferred processingmode. The results can be found in our technical report [13].In summary, we found that the immediate approachprovides a better success rate than the deferred approachand that the deferred approach does not provide improve-ment in terms of the message processing time even thoughit decreases the number of times that the clique search isperformed. The reason that the deferred approach performsworse in terms of the total processing time is that, fork 10, the index update dominates the cost of processingthe messages, and the deferred approach results in a more

crowded index. However, the deferred approach is promis-ing in terms of message processing time for cases wherek values are really large and, thus, the clique search phasedominates the cost.

Fig. 9 plots the average success rate as a function of themean interwait time and mean temporal tolerance. Simi-larly, Fig. 10 plots the average success rate as a function ofthe mean interwait time and mean spatial tolerance. Forboth figures, the variances are always set to 0.4 times themeans. We observe that, the smaller the interwait time, thehigher the success rate. For smaller values of the temporaland spatial tolerances, the decrease in the interwait timebecomes more important in terms of keeping the success

rate high. When the interwait time is high, we have a lower


Fig. 7. Success rates for different k values.Fig. 8. Relative anonymity levels for different k values.

Fig. 9. Success rate with respect to temporal tolerance and interwaittime.


14/18


15/18


16/18

processing time per 103 anonymized messages as a functionof the scaling factor for k 12 with the progressive searchand one-step search. We observe that the progressive searchsuccessfully removes most of the increasing trend seen inthe one-step search. However, it is not completely removed,as can be seen from the small increase when the scalingfactor goes from 2.5 to 3. This also points out that there

should be a system-specified maximum constraint box tostop the performance degradation with unnecessarily largeconstraint boxes.

6.6 Summary of Experimental Results

We summarize major findings from our experiments andthe insights obtained from the experimental results in fourpoints:

1. Nbr-k outperforms local-k in both success rate andrelative anonymity level metrics without incurringextra processing overhead. This is due to its abilityto anonymize larger groups of messages togetherat once.

2.The deferred search, a technique that aims atdecreasing the number of clique searches performedin an effort to increase runtime performance, turnsout to be inferior to the immediate search. This isbecause, for smaller k values, the index search andupdate cost is dominant over the clique search costand the deferred search increases the size of theindex due to batching more messages beforeperforming the clique searches.

3. The progressive search improves the runtime per-formance of anonymization, especially when con-straint boxes and k values are large, without any sideeffects on other evaluation metrics. This nature of theprogressive search is due to its proximity-aware

nature: The close-by messages that are more likely tobe included in the result of the clique search areconsidered first with the progressive search.

4. The CliqueCloak algorithms have the nice propertythat, for most of the anonymized messages, thecloaking box generated is much smaller than theconstraint box of the received message specified bythe tolerance values, resulting in higher relativespatial and temporal resolutions. In conclusion, theconfiguration of [nbr-k, immediate, progressive] issuperior to other alternatives.

7 DISCUSSIONS AND FUTURE WORK

This section discusses some potential improvements, alter-natives, and future directions of our work.

7.1 Success Rate and QoS versus Privacy Trade-Off

Our personalized k-anonymity model requires mobileclients to specify their desired location anonymity leveland their spatial/temporal tolerance constraints. It ispossible that the level of privacy and the QoS can be inconflict in a users specification. When such conflicts occur,the success rate of anonymization will be low for this usersmessages. In practice, such conflicts should be checked todetermine the need for fine-tuning in the privacy level orQoS. The trade-off between the QoS defined by the spatial/temporal tolerance constraints and the level of privacy

protection defined by the anonymity level k should be

adjusted such that the success rate of anonymization is keptclose to 1. In this paper, we developed a locationanonymization framework and associated system-levelfacilitates for fine-tuning of the QoS versus privacyprotection trade-off. Due to the space constraint, we didnot discuss the application-dependent management of user-involved adjustment of this trade-off. We believe that these

issues merit an independent study.7.2 Optimality of the CliqueCloak Algorithms

It is important to note that the CliqueCloak algorithms thatwe introduced in this paper are heuristic in nature.Although we do not know the best success rates that canbe achieved for various distributions of anonymity con-straints, we experimentally showed that, for practicalscenarios in the worst case, our algorithms drop only10 percent of the messages due to nonoptimality. Further-more, since it is extremely hard to accurately predict futurepatterns of messages, it is difficult to build an onlineoptimal algorithm. These two observations lead us to theconclusion that our algorithms will be highly effective inpractice. However, it is an open problem to studyadvanced algorithms that have better optimality andruntime performance.

7.3 Pseudonymous and Nonanonymous LBSs

In this paper, we assumed that the LBSs are anonymous;that is, the true identities of mobile clients are not requiredin the services provided. Services that require the knowl-edge of user identities or pseudonyms (nonanonymous andpseudonymous LBSs) will make the tracking of successivemessages from the same users trivial at the LBS side. Webelieve that the pseudonymous LBSs can benefit from oursolution with some modifications. For instance, onecomplication may arise when successive location-identitybindings take place and the set ofk messages from the twoadjacent bindings share only one pseudonym, which caneasily lead to a trajectory-identity binding. These types ofvulnerabilities can be prevented or mitigated by settingproper time intervals for changing the pseudonymsassociated with mobile clients, without violating the servicerequirements of the LBSs. Nevertheless, further research isneeded for devising effective techniques for performingprivacy-preserving pseudonym updates. In the case ofnonanonymous LBSs, we believe that the location privacyprotection will need to be guaranteed through policy-basedsolutions managed by LBS providers. Policy-based solu-tions require mobile clients to completely trust the LBSproviders in order to use the services provided.

8 RELATED WORK

8.1 Location Privacy and Anonymity

In the telecommunications domain, policy-based ap-proaches have been proposed for protecting locationprivacy [9], [24]. Users may use policies to specify theirprivacy preferences. These policies specify what data aboutthe user can be collected, when and for which purposes itcan be used, and how and to whom it can be distributed.Mobile clients have to trust the LBSs that the locationinformation is adequately protected.

A k-anonymity-based approach is another approach tolocation privacy. It depersonalizes data through perturba-

tion techniques before forwarding it to the LBS providers.



17/18

Location k-anonymity is first studied by Gruteser andGrunwald [8]. Its location perturbation is performed by thequadtree-based algorithm executing spatial and temporalcloaking. However, this work suffers from several draw-backs. First, it assumes a systemwide static k value for allmobile clients, which hinders the service quality for thosemobile clients whose privacy requirements can be satisfied

using smaller k values. Furthermore, this assumption is farfrom optimal, as mobile clients tend to have varyingprivacy protection requirements under different contextsand on different subjects. Second, their approach fails toprovide any QoS guarantees with respect to the sizes of thecloaking boxes produced. This is because the quadtree-based algorithm anonymizes the messages by dividing thequadtree cells until the number of messages in each cellfalls below k and by returning the previous quadrant foreach cell as the spatial cloaking box of the messages underthat cell. In comparison, our framework for locationk-anonymity captures the desired degree of privacy andservice quality on a per-user basis, supporting mobileclients with diverse context-dependent location privacy

requirements. Our message perturbation engine can ano-nymize a stream of incoming messages with differentk anonymization constraints. Unlike earlier work [8], we donot assume knowledge of user positions at the anonymityserver at all times. Our work assumes that the userpositions are known at the anonymity server side only tothe extent that they can be deduced from the users requestmessages. Our proposed location cloaking algorithms areeffective in terms of the success rate of the messageperturbation and the amount of QoS loss due to locationcloaking. They are also flexible in the sense that each usercan specify a personalized k value, as well as spatial andtemporal tolerance values at a per-message granularity, toadjust the requested level of privacy protection and to

bound the amount of loss in spatial resolution and thetemporal delay introduced during the message perturba-tion. The spatial and temporal tolerances in our modelprovide a flexible way of independently adjusting the levelof spatial and temporal cloaking performed.

8.2 Anonymity Support in Databases

In the database community, there exists a large amount ofliterature on security control against the disclosure ofconfidential information. Such disclosures may occur if,through the answer to one or more queries, an adversarycan infer the exact value of or an accurate estimate of aconfidential attribute of an individual. Privacy protectionmechanisms suggested in the statistical databases literature

can be classified under three general methods, namely,query restriction, data perturbation, and output perturbation. Inquery restriction, the queries are evaluated against theoriginal database, but the results are only reported if thequeries meet certain requirements. There are many flavorsof query restriction, like restricting the number of entities inthe result set [25], controlling the overlap among successivequeries [26], keeping up-to-date logs of all queries andchecking for compromises whenever a new query is issued[27], and clustering individual entities in mutually exclusivesubsets and restricting the queries to the statistical proper-ties of these subsets [28]. In data perturbation, the databaseis perturbed and the queries are evaluated against theperturbed database. This is usually done by replacing the

database with a sample of it [29] or by perturbing the values

of the attributes in the database [30]. In output perturbation,the results to the queries are perturbed, whereas the originaldatabase is not. This is commonly achieved by sampling thequery results [31] or by introducing a varying perturbation(not permanent) to the data that are used to compute theresult of a given query [32].

Another piece of related research is computing over

encrypted data values in data mining and database queries.One representative work in recent years is the privacy-preserving indexing technique proposed by Hore et al. forsupporting efficient query evaluation over encrypted data[33]. This work is based on the Database as a Service (DAS)model, where the service providers store encrypted dataowned by the content provider in their servers and providequery services over the encrypted data. User queries aretranslated into two parts: a server-side query that worksover the encrypted data and a client-side query that doesdecryption and postprocessing over the results of the serverquery part. Although the motivation and the problemsaddressed in our paper are different, our work shares acommon assumption with the DAS work in the sense that

the content producer does not trust the service providersand, thus, provides a privacy-preserving index instead ofthe actual data content in the DAS scenario or the perturbedlocation data through location anonymizer in the ano-nymous LBS scenario, as discussed in this paper.

Finally, Samarati and Sweeney have developed ak-anonymity model [12], [11], [10] for protecting dataprivacy and a set of generalization and suppressiontechniques for safeguarding the anonymity of individualswhose information is recorded in database tables. Our workmakes use of this basic idea ofk-anonymity.

9 CONCLUSION

We proposed a personalized k-anonymity

model forproviding location privacy. Our model allows mobileclients to define and modify their location privacy specifica-tions at the granularity of single messages, including theminimum anonymity level requirement, and the inaccuracytolerances along the temporal and spatial dimensions. Wedeveloped an efficient message perturbation engine toimplement this model. Our message perturbation enginecan effectively anonymize messages sent by the mobileclients in accordance with location k-anonymity whilesatisfying the privacy and QoS requirements of the users.Several variations of the spatio-temporal cloaking algo-rithms, collectively called the CliqueCloak algorithms, areproposed as the core algorithms of the perturbation engine.

We experimentally studied the behavior of our algorithmsunder various conditions by using realistic workloadssynthetically generated from real road maps and trafficvolume data. Our work continues along a number ofdirections, including the investigation of more optimalalgorithms under the proposed framework, the study ofQoS characteristics of real-world LBS applications, and howQoS requirements impact the maximum achievable ano-nymity level with reasonable success rate.

ACKNOWLEDGMENTS

The authors thank the anonymous reviewers for theirconstructive suggestions that helped them in improving this

paper. This research was partially supporte

2010_gedik_protecting location privacy with personalized k-anonymity

Documents