protecting location privacy though path confusion [1]

Quantifying Location PrivacyReza Shokri, George Theodorakopoulos, Jean-Yves Le

Boudec,

and Jean-Pierre Hubaux

Presented By: Solomon Njorombe

2

Abstract

• Security issues in progressed personal communication• Many Location-Privacy Protection Mechanisms (LPPMs) proposed• No systematic quantification, and incomplete assumptions

• Framework for LPPMs analysis• Information and attacks available to adversary• Formalize attack performance

• Adversary inference attacks(accuracy, certainty, correctness)• Implement Location Privacy meter• Assess popular metrics(Entropy and k-anonymity) • Low correlation to adversary’s success

Introduction

4

Introduction

• Smartphones with location sensors: GPS/Triangulation • Convenience, but leaves traces of your where about• Infer on habits, interests, relationships, secrets

• Increased computing power. • Data mining algorithms, parallel db analysis• Threat to privacy

•Users have the right to control the information shared• Minimal information or only with trusted entities

5

Introduction: Motivation

Aim: Progress the quantification of performance of LPPM•Why?

• Lack unified generic formal framework. Hence divergent contribution and confusion. Which is more effective LPPM

Humans, bad estimators of risks

A meaningful way to compare LPPMs

Literature, not matured enough in this

6

Introduction: Contributions

1. Generic model to formalize adversarial attacks• Define tracking and localization on anonymous traces as statistical

inference problem

2. Statistical methods to evaluate performance of such inference attack• Expected estimation error as right metric

3. Location Privacy Meter

4. Inappropriateness of existing metrics

Framework

8

Framework

• Location privacy is a tuple : Set of mobile users: Actual traces of userLPPM: Location-Privacy Preserving Mechanism

• Acts on and produces

: Traces observed by adversaryADV: Adversary

• Try to infer a having observed o , relying on LPPM knowledge & users’ mobility model

METRIC: metric for performance and success of ADV. Implies users’ location privacy 𝓤𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪

9

Framework: Mobile Users

• set on N mobile users within area portioned into M regions • : Set of time instants when users can be observed. It is

discrete. • Spatiotemporal position of users modeled through events and

traces• Event: where • Trace for user u: T-size vector for events

au(1)=

au(T) =

-> Tuple

au(2)𝓤𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪

10

Framework: Mobile Users

• : Set of all traces that may belong to user u •Actual trace of u: Only true trace of u for the period t=1…T

Actual (au(1), au(2), … au(T))

•Actual events: Events of the actual trace of user u, , …

• : Set of all possible traces for all users

𝓤𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪

11

Framework: Location-Privacy Preserving Mechanisms (LPPM)• LPPM: Mechanism of modifying and distorting actual traces

before exposure•Different implementations• Offline (e.g. from DB) vs Online (On the fly)• Centralized(central anonymity server) vs Distributes(Users’ phones)

• Receives N actual traces and modify them in 2 steps• Obfuscation: Location event replaced with location pseudonyms • Anonymization: User part of each trace replaced with user

pseudonym 𝓤𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪

12

Framework: Location-Privacy Preserving Mechanisms (LPPM)•Obfuscated event: <u, r’, t> where •Obfuscated trace: • : Set of all possible obfuscated traces of user u


13

Framework: Location-Privacy Preserving Mechanisms (LPPM)•Obfuscation mechanism:

function that maps a trace into a random variable taking values from set • Probability density function

•Methods by LPPMs to reduce accuracy and/or precision of the events’ spatiotemporal information

• Perturbation• Adding dummy regions

• Reducing precision(merge regions)• Location hiding


14

Framework: Location-Privacy Preserving Mechanisms (LPPM)•Anonymization mechanism: Function randomly chosen from

functions mapping to •Drawn according to probability function •We consider random permutation over possible N!


15

Framework: Location-Privacy Preserving Mechanisms (LPPM)

an instantiation of random variables

Set of actual traces

Set of obfuscated traces

Set of anonymized traces

Set of obfuscated traces


16

Framework: Location-Privacy Preserving Mechanisms (LPPM)• Summarize LPPM with the probability distribution that gives

the probability of mapping into

•Adversary’s aim is to reconstruct a when given o

: Set of all observable traces of user u


17

Framework : Adversary

•Knows anonymization and obfuscation probability distribution functions f and •Has access to training traces + users’ public information• Based on this information, construct mobility profile Pu for

each user •Given LPPM(ie. f &), users’ profiles {(u, Pu)}, observed traces

{o1, o2,…, oN} attacker runs inference attack formulating objectives as (subset of Users, Regions & Time)


18

Framework : Adversary

• Presence/Absence disclosure attacks• Infer user, regions relationship over time1. Tracking attacks: ADV trying to find full/partial sequence os a

user’s track2. Localization attacks: ADV target a single event in a user’s trace

•Meeting Disclosure attack• ADV interested in proximity btw 2 users. (meeting in a given time)

• Paper’s algorithm implement general attack• General attack: Try to recover traces for all users𝓤𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪

19

Framework : Evaluation

• Traces are probabilistically generated• Actual traces – probabilistic over user mobility profile• Observed traces – probabilistic over LPPM

•Attack output can be• Probability distribution of possible outcomes • Most probable outcome• Expected outcome under distribution of possible outcomes• Any function of the actual trace


20


• : Function for the attacker’s objective• If its argument is a then is correct answer to the attack

• : Set of values can take for a given attack( M regions, N users, MT traces of one user)

• But attacker cannot obtain exact , the task is highly probabilistic. • Best hope: extract all information about it from observed

traces


21


• Extracted information is in the form Pr(x|o), • x is from all possible value derivable from observed o

•Uncertainty: Ambiguity of Pr(x|o) in respect to finding a unique answer (Max under uniform distribution)• Inaccuracy: Difference between Pr(x|o) and • : estimate as ADV doesn’t have infinite resource

• But uncertainty and Inaccuracy don’t quantify user’s privacy, correctness does


22


• Correctness: Distance between result of the attack and the real answer.

Accu

racy

Certainty

Correctness

•Accuracy and certainty may not be equivalent to correctness

Consider situation with insufficient traces

Only correctness really matters


23


Accuracy: Quantified with confidence interval and level

Certainty: Quantified through entropy. Concentrated vs uniform. Higher entropy -> lower certainty

Confidence level = 1X=xc

Prohibitively costly

) for some x. It is within some confidence interval


24


Correctness: Quantified as expected distance between true xc and . • If there is a distance ||.|| between members of X. expected

estimation error is

• If the distance was =0 iff x=xc and 1 otherwise incorrectness would be:


25


• So correctness is the metric that determines user privacy

•Adversary doesn’t know xc, and cannot observe this parameter. •However Accuracy, Certainty

and correctness are very independent.


Location Privacy Meter

27

Location-Privacy Preserving Machanisms

• Implemented 2 obfuscation mechanisms1. Precision Reducing(merging regions)

Drop low order bits of or region identifier

Eg µx and µy dropped bits of x and y coordinates

2. Location hidingEvents are independently eliminated. Replace location

with Ø with probability λh : location hiding level

• To import LPPM into tool, Specify probability function by importing• Anonymization function• Obfuscation function

28

Knowledge of the Adversary

29


•Adversary collects information about user mobility• Can translate to event, transition, full/partial traces

• This can be encoded as:• Traces or• Matrix of Transition Count TC• TC is an M x M matrix with ij number of i to j transitions user created and

not encoded in the traces• Adversary also considers user mobility constraint

30


•ADV tries to model user mobility using Markov Chain • Such that Pu : user’s transition matrix for their Markov chain• : probability that user will move from rj to ri in next time slot

•Objective: construct Pu starting from prior mobility information.•With bigger goal of:• estimating the underlying Markov Chain• Fill the Training Trace TT towards ET(Estimated Trace)

• Utilize convergence in Gibbs sampling

31

Tracking Attack

•ADV tries to reconstruct partial/complete actual traces

Maximum Likelihood Tracking Attack•Objective: Find jointly most likely traces for all users, given

the observed traces• That is done within a space of N!MT elements, brute force

approach is not practical

32

Tracking Attack : Maximum Likelihood Tracking Attack

• Proceed through two steps:• Deanonymization

Cannot assign most probable traces, multiple users may get same tracesPerform the likelihood for all trace-user pairsCreate an edge weighted bipartite graph

The edge weight is the user-trace likelihoodFind maximum weighted Assignment

use Hungarian algorithm

• De-obfuscation Set of users Set of traces

33

Tracking Attack : Maximum Likelihood Tracking Attack•De-obfuscation• Use Viterbi algorithm. Tries to maximize the joint probability of

the most likely traces. • Recursively compute the values at time T(max probability)• But interest is on the trace itself• Almost similar to finding the shortest path in a edge-weighted

directed graph. Vertices as set of R x T

34

Tracking Attack : Distribution Tracking Attack• Computes the distribution of traces for each user rather than

the most likely trace•Use Metropolis Hasting algorithm• Try to draw sample from that are identically distributed to as

per the desired distribution. •MH tries to perform a random walk over possible values of ()• Can answer wide range of U-R-T questions but very

computationally intensive.

35

Localization Attack

• Find the location of user u at time t•Output: distribution of possible region, from which they

select the most probable•Attacker needs estimate of observed trace(Max weighted

assignment)• Can be computed using Forward-Backward algorithm

36

Meeting Disclosure Attack

• Objective 1: specify a pair of users (u and v), a region r and time t• Computed as a product of the distribution for both events• These established through localization attacks• Another objective: Just a pair of users. How often they would

have met, and the region• Answered using localization attack

• Objective 3: Location and time, expecting number of present users• Through localization attacks again

Using The Tool: Evaluation of LPPMs

38


•Goals:1. Use Location Privacy Meter to quantify effectiveness of LPPMs2. Evaluate effectiveness of entropy and k-anonymity to quantify

location privacy

• Location samples: N=20, 5 min intervals for 8 hrs(T=96), Bay area M=40(5 by 8 grid)• Privacy mechanism:• Precision reducing• Anonymized using random permutation(unique pseudonyms 1-N)

39


• To consider strongest adversary:• Feed Knowledge constructor(KC) with actual traces of user

•U-R-T attack scenario• LO-ATT(Localization Attack): User u at time t, what is his location

at time t?• MD-ATT(Meeting Disclosure Attack): How many instances in T are

two people in the same region• AP-ATT(Aggregate Presence Attack): for a region r and time t, what

is the expected time number of users present at t

•Metric: Adversary incorrectness

40


LPLO-ATT(u,t) for all users u and time t

• LPPM(µx, µy, λh)

• Incorrectness of the # of users

41


LPMD-ATT(u, v) for all pairs of users u, v


• Incorrectness of # of meetings

42


LPAP-ATT(r, t) for all regions r and time t


• Incorrectness of number of users in a region

43


•X-axis: Users privacy•Y-axis: Normalized entropy

*** : LPPM(2, 3, 0.9) strong mechanism

… . : LPPM(1, 2, 0.5) medium

ooo : LPPM(1, 0, 0.0) Weak

44


•X-axis: Users privacy•Y-axis: Normalized k-

anonymity

*** : LPPM(2, 3, 0.9) strong mechanism

… . : LPPM(1, 2, 0.5) medium

ooo : LPPM(1, 0, 0.0) Weak

Conclusion

46

Conclusion

•A unified formal framework to describe and evaluate a variety of location-privacy preserving mechanisms with respect to various inference attacks• LPPM evaluation is modelled as an estimation problem and

the Expected Estimation Error metric is provided•Designed Location-Privacy Meter tool to evaluate and

compare the location-privacy preserving mechanisms

Questions

48

Framework

: Set of mobile users

: Set of regions that partition the whole area

: Time period under consideration

: Set of all possible traces

: Set of all observable traces

: Set of user pseudonyms

: Set of location pseudonyms

: Number of users

: Number of regions

: Number of considered time instants

: Number of user pseudonyms

: Number of location pseudonyms

: Obfuscation function

: Anonymization function

: Actual trace of user u

: Obfuscated trace of user u

: Observed trace of user with pseudonym i

: Set of all possible(actual) traces of user u

: Set of all possible obfuscated traces of user u

: Set of all observable traces of user u

: Profile of user u

: Attacker’s objective

: Set of values that can take

protecting location privacy though path confusion [1]

Documents

user alice

location generalization

variety of location

tracking attack

common formal framework

meeting disclosure attack

right metric

uncertainty metric