privacy preserving publication of moving object data
DESCRIPTION
Privacy Preserving Publication of Moving Object Data. Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain. Joey Lei CS295. Outline. Intro & Background Clustering and Perturbation Techniques Spatio-Temporal Cloaking (Generalization) Techniques Future Research. - PowerPoint PPT PresentationTRANSCRIPT
Privacy Preserving Publication of Moving Object Data
Joey LeiCS295
Francesco BonchiYahoo! Research
Avinguda Diagonal 177, Barcelona, Spain
04/19/23 1CS295 - Privacy and Data Management
Outline
• Intro & Background• Clustering and Perturbation Techniques• Spatio-Temporal Cloaking (Generalization)
Techniques• Future Research
04/19/23 CS295 - Privacy and Data Management 2
Location Privacy
• Growing prevalence of location aware devices– mobile phones and GPS devices
• Two Analysis Groups– Online
• Real-time monitoring of moving objects and motion patterns• development of location based services (LBS)
– Google Maps on the iPhone
– Offline • Collection of traces left by moving objects• Offline analysis to extract behavioral knowledge
– public transportation
04/19/23 3CS295 - Privacy and Data Management
Privacy Concerns
• Location Data allows for intrusive inferences– Reveals habits– Social customs– Religious and sexual preferences– Unauthorized advertisement– User profiling
04/19/23 4CS295 - Privacy and Data Management
Offline Analysis
• Traffic Management Application– Paths (trajectories) of vehicles with GPS are recorded
• Geographic Privacy-aware Knowledge Discovery and Delivery (GeoPKDD)– Traffic data published for the city of Milan (Italy)– Car identifiers were replaced with pseudonyms
• Daily Commute Example– Bob’s home and workplace are traceable by location
systems (QIDs)– Join data with a telephone directory
04/19/23 5CS295 - Privacy and Data Management
Definitions
• Anonymity Preserving Data Publishing of Moving Objects Databases– How to transform published location data while
maintaining utility
• Moving Object Database (MOD)– A set of individuals, time points, and trajectories
04/19/23 6CS295 - Privacy and Data Management
Background: Location Based Services
• Ideals– Provide service without learning user’s exact
position– Location data is forgotten once service is provided
• k-anonymity definition– A response to a request for location data is k-
anonymous when it is indistinguishable from the spatial and temporal information of at least k – 1 other responses sent from different users
04/19/23 7CS295 - Privacy and Data Management
LBS: Location k-Anonymity
• Spatial Requirements– Ubiquity – that a user visits at least k regions– Congestion – number of users be at least k
• One Way to Achieve This: Mix Zones– An area where LBS providers cannot trace a
specific users’ movement– Identity is replaced with pseudonyms
• Users entering these zones at the same time are mixed together
04/19/23 8CS295 - Privacy and Data Management
LBS: Location Based Quasi-Identifier
• A spatio-temporal pattern that can uniquely identify one individual – set of spatial areas and time intervals plus a
recurrence formula– AreaCondominium [7am, 8am],AreaOfficeBldg
[8am, 9am],– AreaOfficeBldg [4pm,
6pm],AreaCondominium[5pm, 7pm]– Recurrence : 3.Weekdays 2.Weeks∗
04/19/23 9CS295 - Privacy and Data Management
LBS: Historical k-Anonymity
• In the offline context– A set of requests satisfies historical k-anonymity if
there exists k – 1 personal histories of locations (trajectories) belonging to k – 1 different users such that they are location-time consistent (undistinguishable)
04/19/23 10CS295 - Privacy and Data Management
Outline
• Intro & Background• Clustering and Perturbation Techniques• Spatio-Temporal Cloaking (Generalization)
Techniques• Conclusions
04/19/23 CS295 - Privacy and Data Management 11
Clustering and Perturbation
• C&P ignores the inherent problems with location QIDs:– each individual can have their own QIDs which
makes it difficult to create a QID for all individuals– Area(Home,Office,??)[??am- ??pm]– Recurrence : 7.Weekdays 52.Weeks∗
• Solution: anonymize trajectories instead– Microaggregation / k-member anonymity
04/19/23 12CS295 - Privacy and Data Management
Clustering and Perturbation
• Trajectories are not polylines, but instead a cylindrical volume with radius δ (or uncertainty radius)
• If another trajectory moves within the cylinder of the given trajectory, then the two trajectory are indistinguishable from each other ((k, δ)-anonymity set)
04/19/23 13CS295 - Privacy and Data Management
Clustering and Perturbation
a) Uncertainty trajectoryb) Anonymity set for two trajectories
04/19/23 14CS295 - Privacy and Data Management
Achieving (k, δ)-anonymity
• Achieved by Space Translation – slightly moving some observations in space
• Step One: cluster trajectories of similar sizes– NWA (Never Walk Alone)
• All equivalence classes have the same time span and special timestamp requirements π (ie. π = 60, only full hours, from 1:00PM-2:00PM)
04/19/23 15CS295 - Privacy and Data Management
Achieving (k, δ)-anonymity
• Step Two: perturb trajectories within uncertainty radius δ (i.e. transformation into anonymity set)– Grouping and Reconstruction
• Finding the nearest matching points to group• Reconstruct a generalization for utility• Multi TGA and Fast TGA Algorithms
04/19/23 16CS295 - Privacy and Data Management
Outline
• Intro & Background• Clustering and Perturbation Techniques• Spatio-Temporal Cloaking (Generalization)
Techniques• Conclusions
04/19/23 CS295 - Privacy and Data Management 17
Trajectory Generalization
Anonymization of three trajectories tr1, tr2 and tr3, based on point matching and removal, and spatio-temporal generalization04/19/23 18CS295 - Privacy and Data Management
Trajectory Reconstruction
Reference: Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining.
04/19/23 19CS295 - Privacy and Data Management
Quasi-identifier Methods
• QIDs are a sequence of locations with multiple sensitive values (locations) – values are different from the perspective of each
adversary
• Yet, must consider linkage attacks from all adversaries
04/19/23 20CS295 - Privacy and Data Management
Quasi-identifier Methods
• Possible Attack– T5 and t5
A match! We know that person visited b1
04/19/23 21CS295 - Privacy and Data Management
Space Generalization
• Each position is an exact point on a grid• Generalizations become rectangles of nearby
points.
04/19/23 22CS295 - Privacy and Data Management
Attack Graph
• Privacy Breach on prior example• Definitions
– I-Nodes (Individuals)– O-Nodes (Moving Object IDs)
04/19/23 23CS295 – Data Privacy and Confidentiality
Attack Graph
• If I1 is mapped to O2, there is no clear mapping for I2 or I3 – Both I2 and I3 map to O3.
• Conclusion– O1 must map to I1
04/19/23 24CS295 - Privacy and Data Management
Attack Graph
• Shortcomings on basic k-anonymity definition– Standard k-anonymity states there should be at
least k paths originating from I (based on grouping).
– What if we group O to have at least k paths?
04/19/23 25CS295 - Privacy and Data Management
Attack Graph
• Privacy Breach– Assume I2, O5 are a pair
– I1 maps to both O1, O2, but this is impossible!• I5 must map to O5
04/19/23 26CS295 - Privacy and Data Management
Final k-Anonymity Definition
• Every I-node has degree k or more• The attack graph is symmetric
– For edge (Ii, Oj) there is also an edge (Ij,Oi)
• 2-anonymous attack graph:
04/19/23 27CS295 - Privacy and Data Management
Future Research
• Ad-Hoc anonymization techniques for intended use of data
• Privacy Preserving Data Mining– Focus on the analysis methods instead of the
publishing
04/19/23 CS295 - Privacy and Data Management 28