privacy vulnerability of published anonymous mobility traces
Post on 01-Jan-2016
22 Views
Preview:
DESCRIPTION
TRANSCRIPT
Privacy Vulnerability ofPrivacy Vulnerability ofPublished Anonymous Mobility Published Anonymous Mobility TracesTraces
Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University)
Nageswara S. V. Rao(Oak Ridge National Laboratory)
Motivation:Motivation:Collecting mobility tracesCollecting mobility tracesMobile network applications
◦traffic monitoring, road surface sensing, radiation and chemical detection
Mobility traces are collected and published to assist the design, analysis, and evaluation of mobile networks◦E.g., Crawdad
Motivation:Motivation:Privacy vulnerabilityPrivacy vulnerability
Measures are carried out to protect privacy of the participants◦Traces are identified using a random
but consistent and unique identifier that is not correlated to the real ID
◦Spatial and temporal granularities are reduced
<11:32:12, Chris Ma, (41.89840,-87.61999)>
<11:30~11:35, ID-271, (41.89~41.90,-87.62~-87.61)>
These measures are not enough!◦Participants can be openly observed◦Participants may leak their location
information (snapshots of time and location pairs, termed as side information) web blogs, status in social networks, tweets,
causal conversations, etc.
An adversary, who tries to identify the complete trace (movement history) of one or more participants, may succeed with high probability
Motivation:Motivation:Privacy vulnerabilityPrivacy vulnerability
Our contributionsOur contributionsComprehensive study of attack
strategies◦Various ways for side information collection◦Analytically proved the optimality of attack
strategy◦Quantitative simulation results
Privacy implications of characteristics of real traces and synthetic traces◦Synthetic nodes are more sparsely placed
More easily identified but more difficult to meet with
AgendaAgendaProblem formulationAnalytical derivationExperimental analysisConclusion
Problem formulationProblem formulation- trace sampling and publication- trace sampling and publication
<t, R.B., (x,y)> <t’, IDi, (x’,y’)>
Problem formulationProblem formulationAn adversary tries to identify the
complete movement history of the participant(s)◦collects side information and
compares with the published tracesPossible attack scenarios
◦Adversary infers the location of a victim indirectly (passive adversary)
◦Adversary observes the movement of the victims physically (active adversary)
Passive AdversaryPassive Adversary- infers snapshots of victim- infers snapshots of victim
Special case:reference times are sampling times
Passive AdversaryPassive Adversary- infers snapshots of victim- infers snapshots of victim
General case:reference times are not sampling times
Passive AdversaryPassive Adversary- infers snapshots of victim- infers snapshots of victim
General case:reference times are not sampling times
Infers the possible location of the node at reference times using a general mobility model - preference of the nodes, physical constraints
Passive AdversaryPassive Adversary- infers snapshots of victim- infers snapshots of victim
General case:reference times are not sampling times
Infers the possible location of the node at reference times using a general mobility model
Passive AdversaryPassive Adversary- infers snapshots of victim- infers snapshots of victim
General case:reference times are not sampling times
Attack approaches of passive Attack approaches of passive adversaryadversaryUse of Bayesian approach to determine the
trace that gives the best match with the inferred location information
Published traces
Noisy side information
Attack approaches of passive Attack approaches of passive adversaryadversaryFor the special case (reference time =
sampling time), with the assumption that noise is i.i.d.,
For the general case, with the assumptions that noise is i.i.d. and movement is Markovian,
Attack approaches of passive Attack approaches of passive adversaryadversary
Most Likelihood Estimator (MLE) approach
Minimum Square (MSQ) approach
Basic (BAS) approach
Weighted Exponential (EXP) approach
• When noise is Gaussian, MLE and MSQ are equivalent
Distance0
0 Distance
0 Distance
Active AdversaryActive Adversary- observes victims physically- observes victims physically
Adversary is one of the participants
Active AdversaryActive Adversary- observes victims physically- observes victims physically
Adversary stays at a (popular) position
Active AdversaryActive Adversary- observes victims physically- observes victims physically
Adversary travels between popular locations
Problem formulationProblem formulationWhy the two different cases?
◦Active Needs to consider how to collect the side
information physically as time evolves Adversary tries to identify as many victims
as possible – plot of k-anonymity as function of time
◦Passive Snapshots of victim are inferred (not
collected) and less accurate in general Adversary tries to identify one victim only –
plot of correctness as function of pieces of side information
Attack strategy of active Attack strategy of active adversaryadversaryAlgorithm of the attack (in
action)1 A, B, C2 A, B, C3 A, B, C
1 A, B, C2 A, B, C3 A, B, C
1 A, B2 A, B3 A, B, C
123
t1t2
real ID trace IDs
Experimental analysisExperimental analysisBasic information
◦Real traces 536 San Francisco taxicabs 2348 Shanghai Grid buses
◦Synthetic traces Using map size and average speed computed
from taxi cab traces Random waypoint (with different maximum
trip lengths) Random walk
◦Spatial granularity = 1 km◦Temporal granularity = 1 minute
(unless stated otherwise)
Characteristics of the tracesCharacteristics of the tracesDistance between tracesDistance between traces
Real traces are closer to each other on average◦ Bus traces have a
broader range For synthetic traces,
the shorter the trip length, the further away they are from each other in general
Significant observationsSignificant observations• Lack of preferred locations and
random initial location of the synthetic traces–Nodes are more sparsely distributed in
the network• Implications:–For adversary in general• Can easily identify the trace of a synthetic
node since no other traces share similar path–For active adversary• May take longer time to meet with each
synthetic node
Attack performanceAttack performancePassive adversary (special case)Passive adversary (special case)
Special case - side-information inferred at sampling times of traces
Correct assumption of noise (Gaussian )
Cab traces Observations
◦ MLE, MSQ perform equally well
◦ BAS gives the least amount of wrong conclusions initially
Attack performanceAttack performancePassive adversary (special case)Passive adversary (special case)
Random waypoint traces
Most efficient attack◦ traces have very
different paths
Attack performanceAttack performancePassive adversary (special case)Passive adversary (special case)
Incorrect assumption of noise◦ Assumption:
Uniform◦ Actual: Gaussian
Cab tracesObservations
◦ MLE is much worsened
Attack performanceAttack performancePassive adversary (general case)Passive adversary (general case)
General case – side information at times different from trace sampling times
Worst case scenario – all times are different
Infer the location of the victim using the mobility model
Gaussian noise (no noise as best performance bound)
Cab traces
SummarySummaryPassive adversaryPassive adversary
For passive adversary◦MLE and MSQ give the best
performance among the four approaches in terms of the fraction of correct conclusions
◦Since MLE relies on the knowledge of type of noise and its magnitude, MSQ is the preferred more robust attack approach
Attack performanceAttack performanceActive adversary as one of mobile nodesActive adversary as one of mobile nodes
Higher attack efficiency for real traces◦ Mobile nodes
more likely to visit the same set of locations at the same time
◦ Synthetic nodes more sparsely distributed in the network
1 time step = 1 minute
Attack performanceAttack performanceActive adversary who stays at one of the Active adversary who stays at one of the cellscells
cabs buses
Random waypointRandom walk
Observations◦ Comparing real traces and synthetic
traces Attacks on real traces are more efficient –
k-anonymity drops more quickly◦ Popular cells in real traces and random
waypoint traces are more aggregated together
◦ Being at a popular cell does not necessarily results in higher attack efficiency
cabs buses
Random waypointRandom walk
Attack performanceAttack performanceActive adversary who moves among Active adversary who moves among popular cellspopular cells
The ability to move among popular cells improve attack efficiency◦ Improvement is more
significant if node movements are more localized
◦Visiting more cells does not necessarily improves efficiency
ConclusionConclusionStudy how privacy leaks through
trace publication◦Under different adversary strategies to
collect side information◦Using different mobile traces with
different characteristicsExperimentally show that the
adversary is able to identify the trace of a victim from the published set with high probability
top related