pampas: privacy-aware mobile participatory sensing using …borcea/papers/ssdbm16.pdf · 2016. 6....

PAMPAS: Privacy-Aware Mobile Participatory SensingUsing Secure Probes

Dai Hai Ton That†, Iulian Sandu Popa†,‡, Karine Zeitouni†, Cristian Borcea?

†DAVID Laboratory - University of Versailles Saint-Quentin, Versailles, France‡INRIA Saclay-Ile-de-France, Palaiseau, France

?Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA{dai-hai.ton-that,iulian.sandu-popa,karine.zeitouni}@uvsq.fr, [email protected]

ABSTRACTMobile participatory sensing could be used in many appli-cations such as vehicular traffic monitoring, pollution track-ing, or even health surveying. However, its success dependson finding a solution for querying large numbers of userswhich protects user location privacy and works in real-time.This paper presents PAMPAS, a privacy-aware mobile dis-tributed system for efficient data aggregation in mobile par-ticipatory sensing. In PAMPAS, mobile devices enhancedwith secure hardware, called secure probes (SPs), performdistributed query processing, while preventing users fromaccessing other users’ data. A supporting server infrastruc-ture (SSI) coordinates the inter-SP communication and thecomputation tasks executed on SPs. PAMPAS ensures thatSSI cannot link the location reported by SPs to the useridentities even if SSI has additional background informa-tion. In addition to its novel system architecture, PAMPASalso proposes two new protocols for privacy-aware location-based aggregation and adaptive spatial partitioning of SPsthat work efficiently on resource-constrained SPs. Our ex-perimental results and security analysis demonstrate thatthese protocols are able to collect the data, aggregate them,and share statistics or derived models in real-time, withoutany location privacy leakage.

CCS Concepts•Information systems→Mobile information process-ing systems; •Security and privacy → Security inhardware;

KeywordsLocation privacy; secure protocol; distributed architecture;mobile participatory sensing; spatial aggregates

1. INTRODUCTIONThere is an increasing interest in mobile participatory

sensing for urban monitoring, which appears to be a bet-

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

SSDBM ’16, July 18-20, 2016, Budapest, Hungaryc© 2016 ACM. ISBN 978-1-4503-4215-5/16/07. . . $15.00

DOI: http://dx.doi.org/10.1145/2949689.2949704

ter alternative to traditional infrastructure-based sensing tocope with the high installation and maintenance costs, aswell as the coverage limitation. Many projects have beenconducted recently around the world - or are still ongoing -in the area of environmental participatory sensing [15], suchas Citi-Sense in Oslo, CamMobSens at Cambridge, Met-roSense at Dartmouth, and OpenSense in Switzerland. Also,many applications that exploit the sensing features of smart-phones are already available. Examples include communitybased traffic monitoring (e.g., Waze1, or Navigon2), findingavailable parking spaces or noise mapping [9]. In addition,the emerging lightweight low-cost sensors are changing theparadigm of environmental and health monitoring3, and al-low measuring in real-time the individual exposure to envi-ronmental risk factors or the propagation of an epidemic.

In these scenarios, the community members act as mo-bile probes and contribute to spatial aggregate statistics,which in turn, benefit the whole community, e.g., dynamictraffic navigation or air quality mapping and alerts. Vari-ous statistics are of interest: basic count and density, av-erage of reported measures by location and time, or morecomplex geo-statistical operations such as spatial interpo-lation [14]. Unfortunately, most current mobile participa-tory sensing systems (MPSS) require users to reveal theirlocations to trusted monitoring servers, which raises seri-ous privacy concerns because user identity could be deter-mined based on several locations that are linked to the sameuser [7]. We should stress that, even if users might trust acentralized service, privacy violation examples are legions(see for example DataLossDB.org) coming from negligence,abusive use, internal or external attacks, and such violationsaffect even the most secured servers. In addition to location,sensing data reported by users could be privacy-sensitive aswell. These privacy issues prevent a wide adoption of MPSS.

Several works consider the MPSS privacy problem suchas [9], [5], [8], [17], [18]. However, most approaches re-quire trusting a proxy server [8], [18], while others are toocostly [9], [5], or sacrifice sensing accuracy for privacy [17].

Hence, providing a high-quality MPSS, while protectingthe users’ privacy, is still a challenge. Recently, the emer-gence of personal secure devices has opened new perspectivesin personal data protection. Be it a secure portable token [1],[19] communicating with the user’s smartphone or pluggedinside it (e.g., Google Vault4), a tamper-resistant hardware

1http://www.waze.com2http://navigon.com3http://www.epa.gov/heasd/airsensortoolbox/4http://www.cnet.com/news/googles-project-vault-is-a-

security module securing the on-board computer of a vehi-cle [10], or the secure TrustZone CPU [2] of the ARM cortex-A series equipping most of mobile devices today, all suchsecure devices offer tangible, hardware-based security guar-antees. We leverage their secure data processing capabilityin a distributed, privacy-by-design architecture, providingan alternative to the traditional server-centric architecture.Our belief is that, similar to TrustZone CPU, such securedevices will become ubiquitous in the near future, equip-ping by default users’ mobile devices. As such, there will beno need for users to buy and connect external hardware toparticipate in MPPS applications.

This paper presents PAMPAS, a Privacy-Aware MobileParticipatory Sensing system for efficient mobile distributedquery processing in the context of MPSS. The novelty ofPAMPAS is two-fold: (1) it provides a system architecturethat protects users’ location privacy by preventing locationtracking from any third-party server; and (2) it provides effi-cient aggregation protocols that satisfy the MPSS real-timeconstraints in spite of the resource limitations of secure de-vices. The privacy guarantee gives users strong incentives forparticipation [11], in addition to the benefits they get fromMPSS applications. In PAMPAS, all participants have a mo-bile device enhanced with secure hardware (i.e., any of thetypes described above), called a secure probe (SP). SPs actas probes for the target phenomenon, perform distributedquery processing, and share the results with the users. Thesecure hardware prevents users from accessing other users’data during the distributed computation. Secure probes ex-change data in encrypted form with help from a support-ing server infrastructure (SSI). To provide real-time results,PAMPAS employs efficient, parallel, location-based aggrega-tion protocols which partition the probes according to theirgeographic distribution. The construction and the mainte-nance of these partitions aim at reducing and balancing theworkloads on worker SPs, while precluding the SSI from do-ing location-based inference attacks against the participants.

We implemented and validated PAMPAS using represen-tative secure hardware platforms. We used two applicationsfor experiments, traffic and noise monitoring, with both realand synthetic datasets representing small and medium-sizecities. Using these applications, we compared PAMPASwith a state-of-the-art secure aggregation protocol describedin [19]. The experimental results show that PAMPAS out-performs this protocol in terms of latency and scalability,which translates to much lower resource utilization at theuser side.

The rest of this paper is organized as follows. Section 2discusses the related work. Section 3 describes the systemarchitecture of PAMPAS, the threat model, and the protocolrequirements. Section 4 presents the location-based globalaggregation protocol, and Section 5 describes the privacy-aware probe partitioning protocol. The security analysis ispresented in Section 6, while the experimental results areshown in Section 7. We conclude the paper in Section 8.

2. RELATED WORKTraditional system architectures used in MPSS such as [8],

[18] rely on a centralized server to collect data from mo-bile participants, process it, and publish the results. Thisserver-centric model is straightforward and easy to deploy,

security-chip-disguised-as-an-micro-sd-card/

run, and maintain. However, this basic approach also raisesserious privacy concerns and prevents a wide adoption ofMPSS. An attacker who is able to link several location re-ports to the same user can then determine the identity ofthe user by leveraging, for example, background information(e.g., user home address). Thus, an attacker can identify theMPSS participants and infer their personal habits and activ-ities [7]. In addition to location which is normally includedin MPSS reports, the sensing data reported by users couldbe privacy-sensitive as well. The works that address thisissue belong to three classes: (1) server-centric architectureand (2) cryptographic protocols for MPSS, and (3) securehardware devices in other contexts.

Server-centric approaches. Virtual trip lines (VTLs) [12]deal with the privacy issue by distributing the traffic moni-toring service implementation across several specialized serversand by providing a spatio-temporal cloaking of the usersunder the VTLs. Although the attack of a single systemcomponent prevents linking the identity and location of theusers, choosing privacy-insensitive locations for VTLs is trickyand limits the traffic information to a part of the road net-work. Also, the problem of multi-component attack (or col-lusion) remains, as well as the high cost of building such acomplex system distributed over several components. SpotMe [17]proposes a different approach consisting in mixing real user’slocation with fake locations before posting them to a centralserver. Then, the server estimates the aggregated user loca-tions by using probability theory. However, the estimationerrors can be important (around 20%), while the number ofobserved spatial units cannot exceed a few hundreds. Also,SpotMe involves higher communication costs because of thelarge number of fake locations, while linkability may still bea problem for users who send many consecutive location up-dates, which limits the usability of this approach to sporadicupdates.

By employing a fully decentralized architecture for com-putation, PAMPAS avoids all the above listed problems.Moreover, the trust is enforced by using cheap but highlysecure, tamper-resistant hardware at the user side.

Cryptographic approaches. Another way to protectthe users’ privacy is to use secure cryptographic protocols [5],[9], [16]. Typically, the cryptographic solutions are based onhomomorphic encryption schemes allowing a central-server [16]or the users [9] to aggregate the samples directly on thecyphertext. However, the cryptographic methods have toface two major limitations. First, homomorphic encryp-tion only allows the computation of basic aggregate func-tions (e.g., count, average, standard deviation), while moreadvanced functions require fully homomorphic encryptionschemes, which are not computationally feasible today. Sec-ond, even with the basic aggregate functions, the crypto-graphic protocols can incur a large computation and com-munication cost [5], [16]. Hence, the existing works typi-cally limit the size of the monitored space (e.g., the numberof roads) and the monitoring frequency. Therefore, suchsolutions cannot meet the scalability and the real-time re-quirements of MPSS at the same time, and are not genericw.r.t. the type of aggregate function.

Secure hardware approaches. Recent works have alsoproposed the use of secure hardware at the user-side [1],[19]. The trust in such a distributed architecture in which allcomputation is done by user devices arises from two sources:(i) the decentralization, i.e., there is no central-server to be

trusted or to be exposed to attacks having a large bene-fit/cost ratio; (ii) the (tamper-resistant) secure hardware atthe user-side, which protects the devices against physicalattacks (even from the device holder).

In [1], Allard et al. propose METAP, a privacy-preservingdata publishing protocol in the context of an architecturecomposed of low power secure devices and a powerful butun-trusted server in order to release sanitized data to thirdparties. However, this data publishing protocol does notconsider the case of spatiotemporal sensed values and cannotbe used in participatory sensing aggregations. To et al. [19]propose a similar architecture, but consider the problem ofexecuting SQL queries over a distributed database withoutrevealing any sensitive information to central servers. Con-sidered in our context, this protocol incurs high computationand communication costs because of the specificity of MPSSaggregates (e.g., the aggregate groups are locations, there isa high number of such groups, the computation is contin-uous and should follow the data generation frequency, theaggregate functions can be complex such as spatial interpo-lation).

PAMPAS shares the idea of employing a user-centric de-centralized architecture with the above mentioned works.However, unlike existing protocols, its secure aggregationprotocol is adapted to MPSS requirements, i.e., high dynam-icity of the participants, real-time constraints for computa-tion, complexity of the aggregate statistics, and low resourceutilization.

A centralized solution based on secure hardware couldalso be devised using recent proposals to ensure shieldedapplication execution over untrusted servers. For example,Haven [3] extends the hardware level protection features pro-vided by the Intel SGX architecture from code snippets tothe entire OS. But there are limitations: this solution slowsdown the computation substantially; the entire security ar-chitecture depends on the chip manufacturer’s ability to pro-tect the secret keys; programmers will miss certain features,such as process creation, that are not supported.

3. SYSTEM OVERVIEWThis section presents the system architecture of PAMPAS,

the threat model in our system, and the data and compu-tation model of the system. Based on these elements, wederive the requirements for the PAMPAS protocols.

3.1 System ArchitecturePAMPAS relies on a hybrid architecture combining secure

elements at the user side (secure probes – SP) and a sup-porting server infrastructure (SSI) that enables secure ex-change of messages between the mobile users. SPs and SSIjointly run two protocols to exchange sensed sample updates,continuously compute the spatially aggregated results, andperiodically partition SPs according to their location. Thisarchitecture fully protects the users’ privacy w.r.t. the SSI.Figure 1 shows the general architecture of our system in thecontext of traffic monitoring.

Compared to a purely decentralized peer-to-peer (P2P)architecture, this hybrid architecture has the salient advan-tage of not consuming any resources from the participants tomaintain the P2P overlay, which is important given the lowresources and availability of the user devices. In addition, itexchanges messages between SPs in O(1) hops as opposedto the typical O(logN) hops in P2P networks.

Figure 1: System architecture

Secure Probe (SP). Each user holds a secure portabledevice, which can be implemented by any type of (tamper-resistant) secure devices flourishing today (see Figure 2) anddescribed in Section 1. Whatever its commercial name andform factor, a secure device, called secure probe (SP) here-after, embeds at least a secure micro-controller (MCU) forcomputation (e.g., a smart card chip) connected to a largeNAND Flash memory for data storage (e.g., an SD card).An SP plays three roles: (i) mobile probe, (ii) processingnode, and (iii) query issuer. The SP sends encrypted sam-ples (containing spatiotemporal sensed measures) to SSI,participates in the data aggregation, and receives the finalresults from other SPs with the help of SSI. Given their high-level of security, SPs are considered trusted in our system.However, this feature comes at a price. The MCU usuallyhas a low power CPU and a tiny RAM (a few tens of KB).In addition, SPs have low availability since they can be con-nected/disconnected as required by the users. Therefore, allthe computation and communication with the SPs have tobe highly optimized.

Figure 2: Examples of secure tokens

Supporting Server Infrastructure (SSI). Differentfrom the typical server-centric architecture, the SSI in oursystem acts only as a coordinator for exchanging messagesbetween the SPs and for temporary storage purposes. Sincethe SSI is not trusted, all the temporary results stored inthe SSI are encrypted using non-deterministic encryption.

In conclusion, the security and privacy in PAMPAS arisefrom the combination of secure hardware with a high degreeof distribution of the architecture (i.e., all computations areexecuted by some of the SPs). The challenge is then to beable to continuously compute any type of aggregate func-tions in real-time in this user-centric architecture given thelow resources of the SPs.

3.2 Threat modelThe attackers in PAMPAS could be either users or the

owners of SSI. Their goal is to collect private user infor-mation (e.g., location or sensing data). Using this privateinformation, attackers can determine the user identities andlearn their activities and behaviors. Our goal is to ensurethat users cannot read the raw data reported by other users.The SSI must not be able to read the user raw data. Also,the SSI must not be able to infer any additional location in-

formation about the participants more than it already knowsor could be inferred from the aggregate result. Hence, thescope of PAMPAS is to fully protect the raw data and theaggregation process, and does not consider the privacy ex-posure risks that arise from analysing the aggregate results,which is a complementary aspect of this work.

Even though the users are untrusted, we assume that allthe SPs are trusted, which is reasonable considering thatthe tamper-resistance of the MCU provides a high level ofprotection against physical and side-channel attacks, andin particular for the data residing in RAM since the RAMmemory is located inside the MCU. We also assume that thehardware manufacturer is trusted and protects the secretsembedded in SPs. In addition, all the persistently storeddata in the NAND Flash is cryptographically protected.

Furthermore, we assume an honest-but-curious threat model,i.e., the SSI obeys the protocol it is supposed to do, butmay try to infer anything it can from the data or behaviorsis sees. Considering a malicious SSI (i.e., the server tam-pers with the protocol, e.g., by dropping messages to infermore information) is of little interest, since a malicious SSIcan be easily detected (e.g., the SPs that aggregate the dataverify if their own samples are present in the data sent bythe server) leading to critical financial/legal consequencesfor the service.

Finally, we assume that the communication between SPsand SSI is anonymous, e.g., by using a proxy forwarder oran anonymization network (e.g., Tor) We assume such sys-tems are able to hide the packet origin from an adversary, sothat privacy cannot be compromised by a malicious serversearching to recognize the origin of the uploaded messages.Let us emphasize that IP anonymity is not enough to pro-tect the user privacy in MPSS because identity informationcould be determined from the location and sensing data.

3.3 Data and Computation ModelsData model. PAMPAS is designed to be generic with

respect to the type of computation required by participa-tory sensing applications. In most cases, such applicationsrequire the aggregation of geo-localized and time-stampedsensed values collected by the sensing devices of the partic-ipants. Therefore, a participant’s device periodically gener-ates an update in the form sample = (location, time, value),which is encrypted and sent to the SSI. PAMPAS does notimpose any restriction on the generation frequency of sam-ples, which may depend on the application sample genera-tion policy. However, the system should be efficient and scal-able for a large number of participants and a high generationfrequency of samples. Also, the participants’ privacy shouldbe fully protected independent of the number and spatiotem-poral distribution of the samples. Furthermore, PAMPASconsiders two types of locations corresponding to the twotypical types of movements of users: (i) free movements inthe two-dimensional space, i.e., location = (x, y); (ii) move-ments constrained by a transportation network (e.g., roador railroad network), i.e., location = (rid, pos), where ridis the road identifier and pos is the relative position on theroad. Finally, the value corresponds to the sensed measure(e.g., traffic speed, noise level, etc.).

Query model. Given a stream of samples and an aggre-gate function, PAMPAS produces a spatiotemporal aggrega-tion of the sample stream such as the stream-SQL-like [13]query formulation in Figure 3. The aggregation is tempo-

ral since the result is computed continuously over time aslong as it is required or whenever the number of partici-pants exceeds some predefined threshold. In this way, thespatiotemporal evolution of the measure of interest is mon-itored over time. To this end, PAMPAS divides the streamusing a sliding time window (see Figure 3) and computesan aggregate result based on all the samples generated inthe time window. The final aggregation result is a spatialfunction representing the evolution in space of the observedmeasure in the respective time window. For instance, the re-sult can be the noise heat-map in the covering area of a cityor the average travel time in a road network. As with theduration of observation, we do not impose any restrictionsregarding the extent of the observed space.

Figure 3: Spatial-temporal aggregates in PAMPAS

Spatial units. As shown in the above query, spatial ag-gregates are based on a discrete referential space, i.e., a fi-nite set of spatial units. Without loss of generality, we con-sider two types of referential spaces corresponding to the twotypes of users’ movements. In the case of free movement, weconsider a uniform grid and each grid cell corresponds to aspatial unit. The size of the units is determined based on theapplication requirements, space size, number of participants,etc. In the case of constrained movement by a transporta-tion network, we consider that a spatial unit corresponds toa network (road) segment, i.e., the network path connectingtwo adjacent network nodes (e.g., the road segment betweentwo intersections). In both cases, the number of spatial unitscan be large (e.g., hundreds of thousands). The COUNT inthe query model is optional and is required in the aggrega-tion protocol to check the probes partitioning.

Aggregate functions. PAMPAS can compute most typesof aggregate statistics required by participatory sensing ap-plications. Practically, our system can compute in real-timeany type of function having reasonable time and space com-plexity given the relative low CPU power and little RAMof the SP. For illustrative purpose, we consider three classesof functions in this paper: (i) Typical algebraic functions:count, sum, average, standard deviation. Such aggregatefunctions are the most popular in the works related to par-ticipatory sensing [9], [5], [16]. These functions allow forexample to compute the average travel time or the trafficdensity in a road network; (ii) Specific functions: inverse dis-tance weighting (IDW). For instance, an application mon-itoring the noise pollution in the city could use the IDWfunction to compute a heat-map of the noise level in the en-tire space [14]; (iii) Holistic functions: median, percentile,top-k. Such functions are also frequently used in statisticalcomputations. Their particularity is that the computationof the result requires accessing the entire sample set andcannot be achieved incrementally by accessing only subsetsof samples as with the previous two classes of functions.

An important observation is that cryptographic solutionsbased on homomorphic encryption cannot be applied for spe-

Table 1: Notations used in the algorithms

Notation DescriptionGi Identifier of probe group iEk Symmetric deterministic encryptionnEk Symmetric non-deterministic encryption

E−1k Symmetric deterministic decryption

nE−1k Symmetric non-deterministic decryption

P fakeGi

Probability to send a fake message ingroup i

N Number of spatial unitsNG Number of probe groups

QIcommDegradation factor of thecommunication time

QIcompDegradation factor of the computationtime

Comp timei Computation time for the group i

cific or holistic functions (see Section 2). Also, the holisticfunctions cannot be computed efficiently in a distributed ar-chitecture by the secure protocol proposed in [19] (as shownin Section 7).

3.4 Protocol RequirementsIn the light of the above description of the proposed user-

centric architecture, the PAMPAS protocols have to dealwith the following challenges: (i) Privacy : By keeping allthe sensitive data in the SPs, the adopted user-centric archi-tecture matches this requirement in contrast with a server-centric architecture. In short, the computation protocolshould not reveal to the SSI any additional information aboutthe participants’ paths besides what the SSI can infer fromthe aggregate result. (ii) Generality : the protocols should beable to compute any type of function over the spatiotempo-ral sensed measures by the mobile users and covering a largeobservation space. This is different from the works basedon cryptographic approaches in which, typically, only basiccomputation (e.g., simple aggregates like sum, average) canbe achieved and only in specific locations over limited peri-ods of time. (iii) Efficiency : the protocols should be highlyefficient to be able to continuously compute the aggregateresults in real-time with very limited resources. Indeed, foreconomic and security reasons, the SPs used for data pro-cessing have low resources and limited availability. Hence,it is necessary to minimize the computation and communi-cation costs of the PAMPAS protocols. (iv) Scalability : theprotocols should allow PAMPAS to scale to a large numberof participants (e.g., up to millions of users), high samplingfrequencies, and very large regions. (v) Accuracy : PAM-PAS should continuously reflect the sensed measures withgood precision. In other words, protecting the users’ pri-vacy should not impact the accuracy of the aggregate resultcomputed by the protocols.

4. GLOBAL AGGREGATION PROTOCOLThe global privacy-preserving protocol in PAMPAS con-

sists in three phases that are repeatedly executed in pipeline(see Figure 4). First, the SSI collects all the sensing updatessent by the participants for a period equal to the slidingtime window (i.e., the collection period). Each update isencrypted using symmetric non-deterministic encryption so

Algorithm 1: PAMPAS Protocol at the SSI-side

1 collection period():2 /* Receive encrypted updates from SPs */3 while (true) do4 message = (Ek(Gi), nEk(sample))5 store(message)→

list[Ek(Gi)][currentT imeWindow]

6 processing period():7 foreach i in {Ek(Gi)} do8 /* feed in parallel the randomly selected SPs */9 randomly select SPi ∈ Ek(Gi)

10 whilemessage← list[Ek(Gi)][lastT imeWindow] do

11 send(message, SPi)

12 foreach i in {Ek(Gi)} do13 /*Receive the final results from worker SPs*/

14 enc resultfinali = (Ek(Gi), nEk(result))

15 delivery period():16 foreach i in {Ek(Gi)} do17 /*Push resulti to all requesting SPs*/

18 multicast(enc resultfinali , {SPk})

that the SSI cannot gain any knowledge from these updates.All the SPs share a secret key, which is renewed periodi-cally to increase security. The key is generated randomlyby a randomly chosen SP. To distribute the secrete key, weassume the users authenticate using a typical PKI infras-tructure, i.e., a certificate is embedded in each user securehardware. Whenever a new SP connects to the system, itauthenticates using its certificate. Then, the SP randomlycontacts another connected SP, which sends back the cur-rent shared secret key encrypted with the public key of thisnewly connected SP.

The shared secret key is used by the SPs to symmetricallyencrypt the update messages (e.g., by using AES encryption)so that any SP can decrypt messages and aggregate the data.Note that, although an SP can decrypt the updates, a useris not allowed to access the decrypted data in her SP andthat the tamper-resistant hardware protects the transitingdata from the user. Therefore, as for the SSI, the users haveaccess only to the final results and not to the raw data.

At the end of the collection period, the SSI triggers theprocessing period. In this phase, only a small subset of SPs,which are randomly selected by the SSI, are involved. TheSSI partitions the collected samples such that the numberof updates in a partition can fit the RAM resources of anSP (otherwise, the persistent Flash storage of the SP has tobe used incurring a much higher computation cost). Then,each sample partition is sent to an SP, which computes apartial aggregate result for the received updates. The en-crypted results are sent back to the SSI. Finally, the deliv-ery period consists of delivering the current partial aggregateresults to the queriers. Each querier needs to perform thefinal aggregation of these partial results, which is merely aconcatenation of the demanded partial results.

Algorithms 1 and 2 give the detailed description of theoperations executed by the SSI and the SPs respectively.In the following, we denote by Ek and nEk the symmetric

Figure 4: Workflow representation of the global protocol in PAMPAS

Algorithm 2: PAMPAS Protocol at the SP-side

1 collection period(): /* for all SPs */2 /* Generate and send the sensing update:

update(Gi, sample) */3 message = (Ek(Gi), nEk(sample))4 send(message, SSI)5 /* Send a fake sample to the SSI with probability

P fakeGi

*/

6 if rand(0, 100) >= P fakeGi

then7 fake message = (Ek(Gi), nEk(fake sample))8 send(fake message, SSI)

9 processing period(): /* only for the selected SPs,one for each Gi */

10 while message = receive(SSI) do11 sample = nE−1

k (message)12 result = result⊕ sample

13 enc resultfinali = (Ek(Gi), nEk(result))

14 send(enc resultfinali , SSI)

15 delivery period(): /* for all SPs */16 /* Pull the results for required {Gi} from the SSI */17 foreach i in {Ek(Gi)} do18 send request(Ek(Gi), SSI)

19 resultfinali = nE−1

k (receive(SSI))

deterministic and non-deterministic encryption with the keyk, and by E−1

k and nE−1k the opposite decryption operations

while Gi indicates the identifier of group i. All the notationsused in the algorithms are listed in Table 1.

To address the performance limitations of the existingprotocols [19] (see Section 2), the aggregation protocol inPAMPAS groups the participants based on their location,which permits processing together the generated samples ina group by a single SP. To this end, the users also send thedeterministically encrypted value of the spatial unit they arecurrently located in, in addition to the non-deterministicallyencrypted value of the sample, i.e., message = (Ek(groupID),nEk(sample)) (Algorithm 1, lines 4-5 and Algorithm 2, lines3-4). Consequently, the SSI can group the messages basedon the encrypted unit number and then send each group ofsamples to a different SP for aggregation (lines 7-11 in Al-gorithm 1 and lines 10-12 in Algorithm 2). By doing so, theadvantage is manifold. First, the processing period is guar-

anteed to terminate in a single iteration, since each involvedSP produces directly the aggregation result correspondingto a spatial unit. This greatly improves both the computa-tion and the communication cost of the aggregation process.Second, data processing by an SP is also efficient since onlyone aggregate is computed, which greatly reduces the RAMrequirements and avoids/reduces the usage of the persistentstorage. Third, the final aggregate result is also partitionedand the queriers can demand the results only for specific spa-tial units, which further improves the communication cost.Furthermore, in order to avoid leaking information regard-ing the spatial distribution of users, the SPs also generateand send fake messages to the SSI (see Algorithm 2, lines 6-8). The rational and detailed explanation for this techniqueare discussed in the next section.

However, despite all these benefits, the above approachhas one fundamental shortcoming originating from the skewedspatial distribution of the participants. Although the exactlocation of the updates and the unit ID are hidden, the SSIknows the number of participants in each spatial unit. If theSSI has access to side information about the spatial distribu-tion of the users (e.g., global traffic density information), itmay use this information to infer the (approximate) locationof the participants and compromise their privacy.

5. PROBE PARTITIONING PROTOCOLTo counter the privacy threats that are rooted in the

skewed spatial distribution of the participants, PAMPAScontinuously partitions the set of probes based on their cur-rent location and the spatial units of the query. Similarto the global aggregation protocol, this privacy-aware par-titioning protocol is executed by SPs. The idea is to groupSPs located in adjacent spatial units such that the resultedprobe groups have approximately the same size. Therefore,in PAMPAS a group Gi covers several spatial units (as de-fined in Section 3.3) and includes all the SPs in these units.

The probe partitioning has to be recomputed periodicallyto keep the groups balanced since the users’ distribution inspace changes over time. Moreover, the groups should con-tain users located in closely situated spatial units to maxi-mize the lifetime of a partitioning. The challenge is to im-plement a partitioning algorithm that can be executed peri-odically at SPs because the typical spatial partitioning algo-rithms are much too costly to be considered in our context(i.e., limited-resources SPs).

Our algorithm is based on the following idea. We use a

Figure 5: Hilbert indexing of spatial units

space-filling curve to index the spatial units of the applica-tion query. A space-filling curve has the property to mapa multidimensional space to a one-dimensional space suchthat, for two objects that are close in the original space,there is a high probability that they will be close in themapped target space. Then, we sort the spatial units onthe space-filing curve index. Once sorted, an approximatebalanced grouping can be checked and computed in O(NG)space complexity and O(N) time complexity, where NG isthe number of probe groups, and N is the number of spatialunits.

Indexing the spatial units. In our system, we useHilbert curves, but other types of space-filling curves canbe used as well to index the spatial units considered by theparticipatory sensing application (e.g., z-curves). In the caseof free movement, the indexing is straightforward since thespace is already partitioned with a uniform grid (see Figure 5left). Then, we cover the grid cells with the Hilbert curvecorresponding to the grid granularity and label each cell withthe obtained Hilbert index. In the case of constrained move-ment, the indexing requires two steps. First, we cover thetransportation network with a uniform grid (see Figure 5right). The grid granularity is chosen such that the num-ber of network segments (see Section 3.3) intersecting witha grid cell is low for most of the cells. Then, the grid cellsare indexed with a Hilbert curve and each network segmentis labeled with the Hilbert index of the cell containing thesegment center. In case several segments are contained bya cell, the segments are sorted by the x-coordinate and they-coordinate of their centers and labeled accordingly. Oncethe spatial units are indexed, they are sorted on the indexvalue and the sorted unit vector is broadcasted to all theparticipants to be used in the probe partitioning phase.

Checking and repartitioning the probe grouping.Periodically, our system verifies if the current probes parti-tioning is still balanced with respect to the number of probesin each group. The verification frequency depends on the dy-namicity of users in space. In PAMPAS, the checking andrepartitioning processes can be executed often (i.e., everyfew seconds) due to their low cost. When a partitioningchecking is triggered, the system computes a count aggre-gate in addition to the application aggregate function (seeFigure 3), which gives the actual number of users (SPs) in allthe spatial units. The count aggregate result is then pushedto an SP randomly chosen by the SSI. The checker SP de-crypts the results and updates the weights5 of the sortedspatial unit vector (lines 4-7 in Algorithm 3). This opera-tion has O(N) complexity assuming that a small index con-

5The weight is the number of probes in a spatial unit.

Algorithm 3: Checking probe partitioning (SP-side)

1 check probe partitioning():/* one randomlyselected SP */

2 /* Pull all the results from the SSI */3 foreach i in {Ek(Gi)} do4 send request(Ek(Gi), SSI)

5 enc resultfinali = receive(SSI)

6 allCounts[Gi][]← E−1k (enc resultfinal

i )7 update localy stored counts for spatial units

/* also required to compute the probability togenerate fake samples */

8 compute weights[Gi] = SUM(allCounts[Gi][])

9 compute standard deviation(weights)10 if standard deviation(weights) < threshold then11 send for broadcast(nEk(allCounts), SSI)

12 else13 execute probes repartitioning()

taining the partitions frontiers is kept in memory by theSP (which requires only NG Flash addresses to be kept inRAM). At the same time, the SP computes in memory thecount by group (since the groups are sent one by one by theSSI, line 8 in Algorithm 3) and compares the counts. If thebalancing of the current probes partitioning is within thepredefined limits, the checker SP sends the current values toall the other SPs (i.e., exchanged encrypted through SSI),which update the weights of the spatial units with the newcount values. Otherwise, the checker SP computes a newpartitioning.

Once the sorted vector of spatial units is updated with thenew weight values, the probe repartitioning can be efficientlycomputed in O(N) and O(NG) time and space complexityrespectively (see Algorithm 4). To set the partition borderswe use a greedy algorithm, which adds spatial units to agroup as long as the total weight of the group is lower thana threshold value (lines 12-16 in Algorithm 4). The thresh-old is computed as the ratio between the total number ofprobes and the number of groups (line 10 in Algorithm 4),and represents the average number of users per group. Thepartitioning result is a list of NG milestones indicating thegroup borders in the sorted index of spatial units (line 15 inAlgorithm 4). The result is then encrypted and delivered,through SSI, to all users, which update their partitioningdata and generate new samples accordingly starting fromthe next computation window.

The proposed probes partitioning algorithm has low com-plexity and can be efficiently executed even with the lowresources of an SP. However, the partitioning algorithm can-not guarantee a certain degree of balancing of the partitionweights. Yet, the partitioning balancing is required to avoidleaking any information regarding the spatial distribution ofusers. To deal with this problem, the SPs generate fake sam-ples in all the probe groups having a number of users lowerthan the maximum size group. Therefore, in the collectionperiod of each computing time window, an SP sends proba-bilistically a dummy sample in addition to the real sample.The probability to send a fake sample is proportional to thedifference between the maximum size group and the numberof users in the SP’s group, and inversely proportional to the

Algorithm 4: Repartitioning process (SP-side)

1 PROBE REPARTITIONING():/* one randomlyselected SP */

2 compute QIcomp and QIcomm for current NG

3 while true do4 /* adjust the number of groups NG */5 if QIcomp > QIcomm then6 tNG = 2 ∗NG

7 else8 tNG = NG/2

9 /* repartition for tNG */10 avgGroupWeight = SUM(allCounts[])/tNG

11 currentGroupWeight = 012 for i = 0 to N − 1 do13 currentGroupWeight+ = allCounts[i]14 if currentGroupWeight ≈ avgGroupWeight

then15 newPartitionMilestones[].add(i)16 currentGroupWeight = 0

17 /* check if the new partitioning for tNG isbetter than for NG*/

18 compute tQIcomp and tQIcomm for tNG

19 if tQIcomp + tQIcomm < QIcomp + QIcomm

then20 NG = tNG; QIcomp = tQIcomp

21 QIcomm = tQIcomm

22 else23 break

24 message = allCounts[]||newPartitionMilestones[]25 send for broadcast(nEk(message), SSI)

number of users in the group (see Algorithm 2, lines 6-8).The same approach is used to hide the number of spatialunits in each group. At the end of the aggregation phase,each aggregating SP adds to the result a number of fake val-ues equal to the difference between the maximum number ofunits in the groups and the number of units in the currentgroup. In this way, all the partial aggregate results receivedby the SSI have the same size and the SSI cannot infer thenumber of cells in any group. Note that the fake values arefiltered out by the worker or querier SPs and therefore haveno impact on the accuracy of the results.

QIcomp = Maxi=1,NG [Comp timei]−Maxj=1,N [Comp timej ] (1)

QIcomm =size(sample)

bandwidth

NG∑i=1

{Maxj=1,NG [Countj(probes)]−

Counti(probes)}+

size(sample)

bandwidth

NG∑i=1

{Maxj=1,NG [Countj(spatialUnits)]−

Counti(spatialUnits)} (2)

Choosing the Number of Probe Groups. The costof the aggregation protocol is composed of the computation

cost at the SP side and the communication cost betweenthe SSI and the SP. The number of probe groups impactsboth the computation and the communication costs. Specif-ically, the computation cost decreases with the increase inthe number of groups and attains the minimum value whenthe number of groups is equal to the number of spatial cells,i.e., an SP is used to aggregate the samples for each spatialunit. But increasing the number of groups leads to a higherimbalance in the groups’ weights, which in turn requires in-jecting more fake samples and enlarges the communicationcost. Therefore, modifying the number of groups has oppo-site effect on the computation and the communication cost.

PAMPAS computes two quality indicators to measure theimpact of the number of groups on the computation andcommunication costs, i.e., QIcomp and QIcomm, as definedby Formulas (1) and (2). QIcomp estimates the degrada-tion of the computation time at the SP side generated bythe fact that several spatial cell aggregates are delegated toone SP instead of using one SP for each cell. Estimatingthe computation time is fairly simple since the time is typi-cally linear with the number of samples to be processed bythe SP, assuming that the aggregation can be entirely pro-cessed in RAM. However, the cost model can be extendedto the case in which it is required to access the secondarystorage. QIcomm estimates the degradation of the commu-nication cost caused by the imbalance of the group weights.The first term indicates the overhead incurred by the fakesamples generated to balance the groups, while the secondterm measures the overhead of generating fake results tobalance the number of aggregate results in each group.

Each time an SP computes the probe partitioning, it alsocomputes the values of QIcomp and QIcomm (line 2 in Algo-rithm 4). If QIcomp > QIcomm, the SP multiplies by two thenumber of groups and re-partitions the probes. If QIcomp <QIcomm the SP divides by two the number of groups and re-partitions the probes (lines 5-8 in Algorithm 4). The SP con-tinues to adjust the number of groups until QIcomp+QIcomm

has minimum value (lines 19-23 in Algorithm 4), meaningthat the aggregation cost is near optimal. Thus, this pro-cess allows adapting the number of groups to the numberand the spatial distribution of the probes.

6. SECURITY AND PRIVACY ANALYSIS

6.1 Security AnalysisThe users cannot read the raw data of other users because

the data stored in memory is protected by the secure MCU(i.e., the RAM is located inside the MCU) and the datastored in NAND Flash are cryptographically protected.

The SSI does not have the encryption key, so it cannot ac-cess the transiting data. In addition, the non-deterministicencryption protects the data against frequency-based at-tacks. The SSI may also try to buy an SP and pass fora user to gain access to the shared encryption key. How-ever, this would be useless since the tamper-resistance of anSP protects the key. The SSI could collude with a querier,but it will gain access only to the aggregate result. Finally,since the samples are communicated through anonymizers,the SSI cannot identify the senders or link consecutive mes-sages from the same user.

The SSI could try to infer information from the deter-ministically encrypted group ID values. Nevertheless, theSSI cannot perform a frequency-based attack using the en-

crypted group ID, since all the groups contain approximatelythe same number of messages. Therefore, the SSI cannot in-fer the corresponding (approximate) location of a group orthe topological neighborhood of the groups (which wouldbe the first step to attack the users’ privacy). Hence, theonly knowledge the SSI acquires is the number of groups andits evolution over time, which does not endanger the users’privacy. Note that even if the SSI has somehow access tothe full partitioning information and the corresponding en-crypted ID, a user is still hidden under the correspondinggroup area and within the crowd in the same group (let usrecall that the messages are sent anonymously so it is hardfor the server to link the messages coming from the sameuser). Hence, the protocols are secure and fully protect theprivacy of the users.

Although, protecting the privacy of users beyond the ag-gregate results is out of the scope of this paper (as discussedin Section 3.2), one can easily integrate basic protectionmechanisms in PAMPAS for such cases. For example, toavoid the risk of exposure for the users situated in verysparse areas (e.g., a single user or very few users located in aspatial unit), we can simply add a predicate in the HAVINGclause of the aggregate query (see Figure 3) indicating theminimum number of users in a spatial unit. In this way, thesparse spatial units are eliminated from the aggregate re-sults. Another solution is to increase the size of the slidingwindow, or of the spatial units accordingly.

6.2 Privacy AnalysisTo underline the high level of privacy protection of PAM-

PAS w.r.t. the SSI, we consider an entropy-based metricand apply it in the context of two scenarios that are relatedto our architecture. We then compare the privacy leakage inthese two scenarios with the privacy leakage in PAMPAS.

Entropy is a popular metric to describe location privacy ingeneral [6], and it is also appropriate to describe the privacy-preserving mechanism of PAMPAS. Commonly, entropy isused to quantify the average degree of uncertainty associ-ated with a set of events. In the case of location privacy,the idea is to prevent user identification by obfuscating herexact location in a spatial region containing a certain num-ber of individuals. Therefore, the level of privacy is directlyrelated to the popularity (i.e., number of individual foot-prints) of the region. This means the higher the popular-ity, the higher the privacy level of the users in that region.Then, entropy is used to quantify the degree of popular-ity of a region. Formally, let reg be a spatial region andlet U(reg) = {u1, u2, . . . up} be the set of users in regionreg. Let fi (with 1 ≤ i ≤ p) be the number of sampleupdates (i.e., footprints) that user ui sends from reg andF =

∑pi=1 fi be the total number of sample updates sent

from reg.

Definition 1. Entropy of a region: the entropy of region

reg is defined as: E(reg) = −∑p

i=1

fiF.log

fiF

Definition 2. Popularity of a region: the popularity of re-gion reg is defined as: Pop(reg) = 2E(reg)

Definition 3. Privacy leakage: the privacy leakage for eachupdatek sent by user ui is defined as:

priv leakui(updatek) =1

Pop(location of updatek)

To compute the privacy leakage in different cases, we con-sider a simple numerical example inspired by the datasetsused in our experimental evaluation (see Section 7.1). Letus consider that 200 thousand users participate in a mobilesensing application that aggregates data over 20 thousandspatial units (e.g., road segments in a road network). Tokeep the formulas tractable (but without loss of generality),let us consider that each user produces 50 samples from 50distinct spatial units. This implies that on average, thereare 500 footprints (i.e., updates) in each spatial unit.

Scenario 1: there is no grouping of the probes. Eachparticipant sends the non-deterministically encrypted valueof a sample together with the deterministically encryptedvalue of the spatial unit identifier to allow an efficient ag-gregation of the data. However, no fake sample is insertedby the probes. Although the spatial unit identifiers are en-crypted, the SSI could easily determine the location of thespatial units if it has access to the global spatial distributionof the probes (i.e., a frequency-based attack). In this case,the average entropy of a spatial unit by applying Defini-tion 1 is E(s.unit) = −

∑500i=1

1500

.log 1500

= log(500), whichgives a popularity of Pop(s.unit) = 500 and an average pri-vacy leakage of priv leak = 0.002 for each update. Clearly,the privacy leakage can be lower or higher for each spatialunit depending on the popularity value compared with theaverage value.

Scenario 2: there is a static partitioning of probes, i.e.,the spatial units are statically partitioned into a numberof groups containing closely located spatial units. As inthe previous case, the probes send the deterministically en-crypted value of the spatial group and are also exposed toa frequency-based attack from the SSI. However, groupingmany spatial units leads to decreasing the privacy leakagerisk (but at the cost of increased aggregation time). Forinstance, partitioning the spatial units in 200 groups (i.e.,100 spatial units per group), leads to an average popular-ity Pop(group) = 103 and thus an average privacy leakagepriv leak(update) = 10−3, which is smaller than in the pre-vious scenario. Also, the obfuscation region is much largersince it corresponds to 100 spatial units instead of one.

PAMPAS goes even further in the protection of the par-ticipants’ privacy by using a dynamic partitioning of theprobes based on their location and spatial distribution. Theadaptive partitioning produces nearly balanced groups ofprobes. In addition, the eventual imbalance of the groupsis corrected by injecting fake tuples, which precludes theSSI doing frequency-based attacks. This means that it isextremely difficult for the SSI to estimate even the approx-imate corresponding area of each group. Therefore, in ourcase, the entropy applies indistinguishably to all the partic-ipants leading the a popularity Pop(group) = 2 · 106 and anaverage privacy leakage priv leak(update) = 5 · 10−7. Prac-tically, in PAMPAS, the privacy leakage depends only onthe total number of participants, while the obfuscation areacorresponds to the entire observation space. Besides, thenumber of groups is adaptively chosen such as to minimizethe aggregation cost.

7. EXPERIMENTAL EVALUATIONThe goals of our experimental evaluation are twofold: (i)

compare the execution time and scalability of PAMPAS withthose of a state-of-the-art protocol described in [19]; (ii)quantify the cost and scalability of our partitioning pro-

Figure 6: Aggregation maps for two applications: noise mon-itoring (left) and traffic monitoring (right)

Figure 7: Secure tokens

tocol. We implemented and validated PAMPAS throughemulations using secure tokens which have a hardware con-figuration representative for secure hardware platforms. Asapplications for our experiments, we used traffic monitoringand noise monitoring with both real and synthetic datasetsrepresenting small and medium-size cities. Figure 6 illus-trates our graphic interface for these applications; it showsthe aggregate results for the noise heat-map and the averagetravel time for the road network. A demo of our prototypewas presented in [20] using a traffic monitoring scenario.

7.1 Experimental SettingHardware platform. In our experimental evaluation,

the SSI is hosted on a PC (3.1 GHz i5-2400, 8GB RAM,running Windows 7) which also displays the aggregate re-sults in a graphical form for validation purpose. The SPs areimplemented by representative secure hardware devices (seeFigure 7) which includes an MCU with a 32-bit RISC CPUat 120MHz, a cryptographic co-processor implementing AESand SHA, 128KB of static RAM and 1MB of NOR Flashto store the software stack, a smartcard chip hosting thecryptographic credentials (i.e., the secrete encryption keys)and an SD card reader allowing for a large storage capacity.We used a commodity SD card (Samsung SDHC EssentialClass 10 of 32GB) as secondary Flash storage. The SSI inour testing system manages a multi-channel Ethernet con-nection with a global bandwidth of 100Mbps. Importantly,on the SP’s side, our implementation limits the RAM con-sumption to only 30KB and the maximum communicationbandwidth to 200Kbps to validate the proposed protocolswith less powerful secure devices. Thus, in our experiments,all the SPs have this minimalist configuration. To emulatea very large number of SPs, we execute sequentially on anSP the aggregate computations and communications for allthe worker SPs and measure the ”parallel” execution time asthe maximum aggregation time in the execution sequence.

Baseline system. To underline the importance of thePAMPAS protocols, we implemented the secure protocol pro-posed in [19] and took it as the baseline. This protocol canbe applied without modification to aggregate the samples

collected in each time window. Since PAMPAS offers thesame level of security and privacy as the baseline protocol,our experimental evaluation focuses on the efficiency part.Note that in [19], two more protocols are proposed that areeven more expensive than the secure protocol if consideredin our context.

Datasets and aggregate functions. We use both syn-thetic and real datasets to test the efficiency and scalabilityof PAMPAS. We employed the well-known Brinkhoff genera-tor [4] to generate mobility traces on two real road networksof the cities of Oldenburg (Germany) and Stockton (SanJoaquin County, CA). Oldenburg is a small size networkhaving 7035 road segments, while Stockton is a medium sizeroad network having 24123 segments. Depending on the net-work size, we generated traces corresponding to a mediumand large number of users. With Oldenburg, the mediumand large datasets contain 47 thousand and 270 thousandusers respectively. With Stockton, the medium and largedatasets contain 200 thousand and 1.35 million users respec-tively. The spatial distribution of the traces follows the net-work spatial density. Compared to the existing real datasets,the synthetic datasets have the prominent advantage of hav-ing excellent spatial and temporal coverage. However, it isalso important to validate the proposed protocol with realdatasets. To this end, we used the T-Drive Taxi trajectorydataset [21]. This dataset contains around 15 million trajec-tory units collected from 10357 taxis over a period from Feb.2 to Feb. 8, 2008 in Beijing. Because the density of taxis istoo low compared to the synthetic dataset, we extracted andmerged a period of one hour in our tests, in order to gener-ate a dataset containing 191 thousand trajectories covering32800 road segments.

To show the generality of PAMPAS, we selected three ag-gregate functions, i.e., average, IDW [14] and median, cor-responding to the three aggregate types described in Sec-tion 3.3. We associate the average and median aggregateswith the traffic monitoring application, i.e., compute theaverage travel time and the median speed for each road seg-ment in a road network. Hence, these two scenarios considerthe constrained movement type. The IDW aggregate is as-sociated to the noise-level monitoring application and a freemovement type. In this case, we use the same generatedmobility traces, but consider them in the 2D space insteadof the network space. Also, we use a 64x64 grid to dividethe observed 2D space into 4096 spatial units for the freemovement scenario. The speed sample values are directlygenerated by the moving objects generator, while the noisevalues are generated by us proportionally to the number ofprobes in the spatial unit.

7.2 Performance EvaluationExecution time. Figure 8 shows the aggregation time

(in a logarithmic scale) for the three functions of both Base-line and PAMPAS protocols with 191 thousand probes inBeijing and with 200 thousand probes in Stockton. Theaggregation time is global, i.e., it includes both the compu-tation and communication time. The results indicate thatPAMPAS is very efficient since it requires only a few sec-onds to compute the aggregate results for all the tested func-tions in both datasets. Also, the aggregation times of PAM-PAS are similar between the real and the synthetic datasets.However, in both cases the baseline protocol is much morecostly (especially for complex aggregate functions) leading

Figure 8: Aggregation time of PAMPAS and Baseline pro-tocols in real dataset (top) and synthetic dataset (bottom)

to aggregation times up to three orders of magnitude higherthan PAMPAS. Moreover, the aggregation times with thebaseline protocol are larger for the Beijing dataset. Theexplanation is that the number of spatial units is larger inBeijing (i.e., 32800) than in Stockton (i.e., 24123). On theother hand, PAMPAS is scalable with respect to both theaggregation function and the number of spatial units in thequery.

Scalability. We further test the scalability of the pro-tocols with different number of probes, spatial units, andaggregate functions. Figure 9a shows the aggregation timefor the two protocols for the average (top graph) and median(bottom graph) functions with medium and large number ofusers on both road networks. The results confirm that onlyPAMPAS is scalable w.r.t. all the varying input parameters.In the worst case, the computation time attains 14 secondsto compute the median speed for 1.35 million samples cov-ering 24123 spatial units.

The baseline protocol does not scale with the number ofsamples and especially with the number of spatial units.Practically, the baseline can provide real-time aggregationonly for a small number of spatial units (i.e., 7000 in Old-enburg) and basic aggregate functions (e.g., average). Thevery limited RAM of the SPs and the impossibility to effi-ciently parallelize the aggregate computation make the base-line inadequate for the requirements of participatory sensingapplications.

Cost and scalability of partitioning protocol. Fig-ure 9b (top) presents the partitioning computation time forboth Oldenburg and Stockton networks. A new partitioningcan be computed in a few seconds by an SP. This meansthat the checking and probes re-partitioning can be exe-cuted frequently, which allows PAMPAS to adapt to evenfast changes in the spatial distribution of the probes. Mostof the partitioning cost resides in reading and writing thepartitioning data to the secondary Flash storage. This alsoexplains the increase of the partitioning time with the num-ber of partitions, since in this case the I/O operations areexecuted at a smaller granularity, which is more costly.

Figure 9b (bottom) indicates that the partitioning unbal-ance factor, i.e., the ratio between the maximum and the av-erage partition size, increases with the number of partitions.The unbalance factor is an important indicator in PAMPASsince the higher the unbalance, the higher the number of fakeinjected samples and, therefore, the communication cost.

Figure 9c shows the impact of the number of partitions onthe global aggregation time as well as on the computationand communication cost, which compose the total time. Thecomputation time decreases with the increase of the numberof partitions since the amount of work done by the aggre-gation SPs also decreases. Conversely, the communicationtime increases with more partitions since more fake samplesare injected into the system as explained above. Globally,the near-optimal aggregation time is obtained with a num-ber of partitions that minimizes the cumulated degradationof the computation and communication costs (see Section 5).We obtained similar results with the real dataset, for whichthe optimal number of partitions is 100 while the networkpartitioning is computed in just 2 seconds. The aggregationcosts are partially shown in Figure 8 (top). Given the spacelimitation and the similarity of the results with the syntheticdatasets, we omit here the details of the results with the realdataset.

Discussion. It is worth mentioning that the aggregationtime can be greatly improved by increasing the processingpower and the communication bandwidth of the SSI. Forexample, increasing the server bandwidth from 100Mbps to1 GBps, makes the maximum aggregation time (i.e., me-dian function with the large Stockton dataset) to drop from14 seconds to less than 7 seconds. Also, in some scenarios,pushing the computation in the user devices may be prob-lematic (e.g., battery powered devices, concurrent applica-tions running in the device). However, PAMPAS minimizesthis type of problem thanks to its design and its high ef-ficiency. For instance, in our tests, a user participating inthe system for one hour, has a probability between 3.5%and 8.7% to participate once to an aggregate computationassuming that aggregates results are produced every 30 sec-onds, and a probability between 0.004% and 0.12% to doa repartitioning assuming that the probes partitioning ischecked every 1 minute. In all cases, the computation isdone in a few seconds at most and requires only modest re-sources. Moreover, the computation effort is inversely pro-portional to the probability to be picked.

8. CONCLUSIONThis paper proposes PAMPAS, a privacy-aware mobile

participatory sensing system based on a distributed archi-tecture and personal secure hardware. This combination al-lows PAMPAS to achieve the same level of privacy as cryp-tographic solutions without having to sacrifice generality,scalability, and accuracy. The proposed aggregation solu-tion is, to the best of our knowledge, the first proposal of adistributed protocol that is secure, efficient, and scalable andthat fits both the strict hardware constraints of secure per-sonal devices and the real-time constraints of participatorysensing applications. The experimental evaluation based onrepresentative hardware for secure platforms validates theproposed solution.

9. REFERENCES[1] T. Allard, B. Nguyen, and P. Pucheral. METAP:

revisiting privacy-preserving data publishing usingsecure devices. Distributed and Parallel Databases,32(2):191–244, 2014.

[2] ARM. ARM Security Technology - Building a SecureSystem using TrustZone Technology. ARM TechnicalWhite Paper, 2009.

(a) Scalability of the PAMPAS andBaseline protocols with Average function(top) and Median function (bottom)

(b) The partitioning costs (top) and thepartitioning imbalance factor (bottom)with different number of partitions

(c) Communication and computationcosts of Median function with differentnumber of partitions

Figure 9: Performance evaluations

[3] A. Baumann, M. Peinado, and G. Hunt. Shieldingapplications from an untrusted cloud with haven. InOSDI, pages 267–283, 2014.

[4] T. Brinkhoff. A framework for generatingnetwork-based moving objects. GeoInformatica,6(2):153–180, June 2002.

[5] J. W. S. Brown, O. Ohrimenko, and R. Tamassia.Haze: privacy-preserving real-time traffic statistics. InACM SIGSPATIAL, pages 540–543, 2013.

[6] M. L. Damiani. Location privacy models in mobileapplications: conceptual view and research directions.GeoInformatica, 18(4):819–842, 2014.

[7] Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, andV. D. Blondel. Unique in the crowd: The privacybounds of human mobility. Scientific reports, 3, 2013.

[8] E. D’Hondta, M. Stevens, and A. Jacobs.Participatory noise mapping works! an evaluation ofparticipatory sensing as an alternative to standardtechniques for environmental monitoring. Pervasiveand Mobile Computing, 9(5):681–694, October 2013.

[9] G. Drosatos, P. S. Efraimidis, I. N. Athanasiadis, andM. Stevens. A privacy-preserving cloud computingsystem for creating participatory noise maps. InCOMPSAC, pages 581–586, 2012.

[10] M. Faezipour, M. Nourani, A. Saeed, andS. Addepalli. Progress and challenges in intelligentvehicle area networks. Magazine Communications ofthe ACM, 55(2):90–100, 2012.

[11] H. Gao, C. H. Liu, W. Wang, J. Zhao, Z. Song, X. Su,J. Crowcroft, and K. K. Leung. A survey of incentivemechanisms for participatory sensing. IEEE Comm.Surveys and Tutorials, 17(2):918–943, 2015.

[12] B. Hoh, T. Iwuchukwu, Q. Jacobson, D. Work, A. M.Bayen, R. Herring, J. C. Herrera, M. Gruteser,M. Annavaram, and J. Ban. Enhancing privacy andaccuracy in probe vehicle-based traffic monitoring viavirtual trip lines. IEEE Tran. on Mobile Computing,

11(5):849–864, 2012.

[13] N. Jain, S. Mishra, A. Srinivasan, J. Gehrke,J. Widom, H. Balakrishnan, U. Cetintemel,M. Cherniack, R. Tibbetts, and S. B. Zdonik. Towardsa streaming sql standard. In PVLDB 1(2), pages1379–1390, 2008.

[14] S. Nittel, J. C. Whittier, and Q. Liang. Real-timespatial interpolation of continuous phenomena usingmobile sensor data streams. In ACM SIGSPATIAL,pages 530–533, 2012.

[15] M. Penza. Cost action TD1105: New sensingtechnologies for environmental sustainability in smartcities. In IEEE SENSORS, 2014.

[16] R. A. Popa, A. J. Blumberg, H. Balakrishnan, andF. H. Li. Privacy and accountability for location-basedaggregate statistics. In CCS, pages 653–666, 2011.

[17] D. Quercia, I. Leontiadis, L. Mcnamara, C. Mascolo,and J. Crowcroft. Spotme if you can: Randomizedresponses for location obfuscation on mobile phones.In ICDCS, pages 363–372, 2011.

[18] A. Thiagarajan, L. Ravindranath, K. LaCurts,S. Madden, H. Balakrishnan, S. Toledo, andJ. Eriksson. Vtrack: accurate, energy-aware roadtraffic delay estimation using mobile phones. In ACMSenSys, pages 85–98, 2009.

[19] Q.-C. To, B. Nguyen, and P. Pucheral.Privacy-preserving query execution using adecentralized architecture and tamper resistanthardware. In EDBT, pages 487–498, 2014.

[20] D.-H. Ton-That, I. Sandu-Popa, and K. Zeitouni.PPTM: Privacy-aware participatory traffic monitoringusing mobile secure probes. In IEEE MDM, 2015.Demo paper.

[21] J. Yuan, Y. Zheng, W. Xie, X. Xie, G. Sun, andY. Huang. T-drive: driving directions based on taxitrajectories. In SIGSPATIAL, pages 99–108, 2010.

pampas: privacy-aware mobile participatory sensing using …borcea/papers/ssdbm16.pdf · 2016. 6....

Documents