introduction

Structure based Data De-anonymization of Social Networks and Mobility Traces

Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology

Mudhakar SrivatsaIBM T. J. Watson Research Center

Jing S. HeKSU

Presenter: Qin Liu, Chinese University of Hong Kong

Ji et al. Structure based Data De-anonymization

Introduction

• Social networking services are a fast-growing business nowadays– Facebook, Twitter, Google+, LiveJournal, YouTube, …

• When users participate in online social network activities, people’s privacy suffers potential serious threat– Create personal portfolio, post current location, …

• Countermeasures– Naïve anonymization: removing “Personally Identifiable Information (PII)”

– Edge modification

– k-anonymity and its varients

• Still vulnerable to powerful structure-based de-anonymization attacks– Narayanan-Shmatikov attack (IEEE S&P 2009)

– Srivatsa-Hicks attack (ACM CCS 2012)

– Others

Narayanan-Shmatikov attack (IEEE S&P 2009)• Anonymized data: Twitter (crawled in late 2007)

– A microblogging service

– 224K users, 8.5M edges

• Auxiliary data: Flicker (crawled in late 2007/early 2008)– A photo-sharing service

– 3.3M users, 53M edges

• Result: 30.8% of the users are successfully de-anonymized

Twitter Flicker

User mapping

HeuristicsEccentricityEdge directionalityNode degreeRevisiting nodesReverse match

Srivatsa-Hicks (ACM CCS 2012)• Anonymized data

– Mobility traces: St Andrews, Smallblue, and Infocom 2006

• Auxiliary data– Social networks: Facebook, and DBLP

• De-anonymize mobility traces using corresponding social networks

• Over 80% users can be successfully de-anonymized

Other structural de-anonymization attacks• Backstrom et al. attack (WWW 2007)

– Both active attacks and passive attacks

• Narayanan et al. attack (IJCNN 2011)– A simplified version Narayanan-Shmatikov attack (IEEE S&P 2009)

– For breaching link privacy

• Pedarsani et al. attack (Allerton 2013)– A Bayesian method based attack

Limitations of existing attacks• Not scalable

– E.g., Backstrom et al. attack (WWW 2007) needs to create Sybil users before anonymized data release, which is not controllable or scalable

– E.g., Srivatsa-Hicks attack (CCS 2012) has a complexity of O(k!n3), k is the number seeds, which is not scalable

• High computational cost– E.g., Narayanan-Shmatikov attack (S&P 2009) has a complexity of O(nk+n4)

• Not general– E.g., Narayanan-Shmatikov attack (S&P 2009) is designed for directed graph

– E.g., Pedarsani et al. attack (Allerton 2013) is good for sparse graphs but bad for dense graphs

Our contributions• Defined and mesured three de-anonymization metrics

– Strucutral similarity, relative distance similarity, and inheritance similarity

• Proposed a Unified Similarity (US) based De-Anonymization (DA) framework– Iteratively de-anonymize data with accuracy guarantee

• Generalized DA to an Adaptive De-Anonymization (ADA) framework– To de-anonymize large-scale data without the knowledge on the overlap size between the

anonymized data and the auxiliary data

• Applied the proposed de-anonymization attacks to real world datasets– Successfully de-anonymized three mobility traces: At Andrews, Infocom06, and Smallblue

– Successfully de-anonymized three social network datasets: ArnetMiner, Google+, and Facebook

Outline

• Background• Preliminaries and Model• De-anonymization• Generalized Scalable De-anonymization• Experiments• Conclusion and Future Work

Preliminaries and Model• Anonymized data graph

• Auxiliary data graph

• Attack Model– A de-anonymization attack is a mapping of users from the anonymized graph to the

auxiliary graph, i.e.,

Datasets – mobility traces• Mobility traces (anonymized data) and social networks (auxiliary data)

(same as Srivatsa-Hicks attack (ACM CCS 2012))

• Preprocess mobility traces to construct anonymized contact graphs (see Srivatsa and Hick’s paper for detail)

• Use social network as auxiliary data to de-anonymize mobility traces

Datasets – social networks• ArnetMiner

– A coauthor network

– A weighted graph with weight indicating the number of coauthored papers

– 1,127 authors and 6,690 “coauthor” relationships

• Google+– Two Google+ datasets crawled on July 19 and August 6 in 2011, denoted by JUL and

AUG, respectively

– JUL: 5,200 users, 7,062 connections

– AUG: 5,200 users, 7,813 connections

• Facebook– 63,731 users

– 1,269,502 friend relationships

Outline

De-anonymization• High-Level Description

– Seed selection

– Mapping propagation

• Seed selection– Identify a small number of seed mappings from the anonymized graph to the auxiliary

– Bootstrap the de-anonymization

• Mapping propagation – De- anonymize the anonymized graph using multiple similarity measurements

Mapping Propagation• Metrics

– Structural Similarity

– Relative Distance Similarity

– Inheritance Similarity

– Unified Similarity

• We also defined the weighted version of these metrics by considering the weights on edges

• Propagation framework

Structural Similarity• Degree centrality

– The number of ties that a node has in a graph

Structural Similarity• Closeness centrality

– How close a node is to others nodes in a graph

Structural Similarity• Betweenness centrality

– A node’s global structural importance within a graph

Structural Similarity• Defined as the cosine similarity between two nodes’ degree, closeness,

and betweenness centralities

Relative Distance Similarity• Defined as the cosine similarity between two nodes’ distance vectors to

Inheritance Similarity• Characterize the knowledge provided by current mapping results

– Two nodes have more common mapped neighbors will have high inheritance similarity score

Unified Similarity (US)• Considering the structural similarity, relative distance similarity, and

inheritance similarity

Weights US

Structural similarity

Relative distance similarity

Inheritance similarity

US based De-Anonymization (DA) Framework

• Step 1: seed identification by existing techniques

• Step 2: calculate two candidate node sets Ca and Cu from the anonymized graph and the auxiliary graph, respectively

• Step 3: calculate the US of each user from Ca to every user in Cu, and construct a weighted bipartite graph from Ca and Cu based on the calculated US scores

• Step 4: Seek a maximum weighted bipartite matching

• Step 5: Decide whether to accept a node de-anonymization result in the bipartite mathching

• Go to step 2 if the end condition is not reached

Outline

Generalized Scalable De-anonymization

• Core Matching Subgraph (CMS)

Adaptive De-Anonymization (ADA)

Identify initial CMS

Run DA on initial CMS

Update CMS or End

Outline

Experiments – de-anonymize mobility traces

Experiments – de-anonymize ArnetMiner

Experiments – de-anonymize Google+

Experiments – de-anonymize Facebook

Conclusion and Future Work

• Conclusion– Proposed and examined several structural similarity metrics– Designed a new scalable structural de-anonymization framework for

mobility traces and social networks– Validated the proposed de-anonymization framework on multiple

mobility traces and social networks

• Future work– More experiments on large-scale datasets– De-anonymizablity quantification (partially done in our ACM CCS 2014

paper)– Secure data publishing system

Thank you and the presenter Qin Liu!Shouling Ji

sji@gatech.edu

http://users.ece.gatech.edu/sji/

introduction

Documents

· contents introduction introduction

general introduction 1.1 introduction

introduction and welcome€¦ · introduction and welcome...

1 introduction t 1. introduction to nanotechnology -...

introduction to transducers, introduction to transducers...

r as a web service introduction introduction introduction

introduction and thesis introduction - …...1 introduction...

introduction specifications introduction

· introduction . 6 . contents . introduction

introduction: definition introduction: philosophy

introduction 1. introduction

introductions (leads) creative introduction action...

introduction to transducers, introduction to transducers ......

an introduction to nsan introduction to ns2an introduction...

optimisation des requêtes. introduction introduction

an introduction to nsan introduction to ns2an introduction...

visionaryscancombinesarichsetofqualitycontrolandproductivi...

the sacrament of baptism introduction video introduction...

introduction & strategy module introduction &...

introducción / introduction / introduction