towards identity anonymization on graphs. introduction

36
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS

Upload: molly-thompson

Post on 24-Dec-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

TOWARDS IDENTITY ANONYMIZATION ON GRAPHS

Page 2: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

INTRODUCTION• Removing the identities of the nodes before publishing the

graph/social network data does not guarantee privacy.

• The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals.

• We call a graph -degree anonymous if for every node , there exist at least other nodes in the graph with the same degree as .

Page 3: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

MOTIVATION• Social networks, online communities, peer-to-peer file

sharing and telecommunication systems can be modelled as complex graphs.

• These graphs are of significant importance in various application domains such as marketing, psychology and homeland security.

Page 4: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

MOTIVATION• in a social network, nodes correspond to individuals or other

social entities, and edges correspond to social relationships between them.

• http://www.yasiv.com/facebook

• https://apps.facebook.com/touchgraph

Page 5: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE CHALLENGE

How to minimally modify the graph to protect the identity of each individual involved without losing the information ?

Page 6: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE PRIVACY BREACHES INSOCIAL NETWORK CATEGORIES• identity disclosure:

the identity of the individual who is associated with the node is revealed

• link disclosure:sensitive relationships between two individuals are disclosed

• content disclosure:the privacy of the data associated with each node is breached e.g., the email message sent and/or received by the individuals in a email communication graph

Page 7: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

NOTES • -anonymization is used for content disclosure.

• In this paper, we focus on identity disclosure.

Page 8: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

PROBLEM• Given a graph and an integer , modify via a set of edge-

addition (or deletion) operations in order to construct a new -degree anonymous graph, in which every node has the same degree with at least other nodes.

Page 9: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

CHALLENGE

we want to preserve the utility of the original graph, while at the same time satisfy the degree-anonymity constraint

Page 10: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

PROBLEM DEFINITION• The graph

• the degree sequence of ; vector of size

• is the degree of the node of

• entries in d are ordered in decreasing order of the degrees they correspond to, that is,

• is subsequence of that contains elements

Page 11: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

PROBLEM DEFINITION• Definition 1

A vector of integers is -anonymous, if every distinct value in v appears at least times.

vector v = [5; 5; 3; 3; 2; 2; 2]

2-anonymous

• Definition 2

A graph is -degree anonymous if the degree sequence of , is -anonymous.

• This property prevents the reidentification of individuals by adversaries with a priori knowledge of the degree of certain nodes.

Page 12: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

PROBLEM DEFINITION

Page 13: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE PROBLEM1• The input and an integer .

• The output -degree anonymous graph ( ,)

• we restrict the graph modification operations to edge additions

• The graph-anonymization cost should be minimized (minimizing the distance between and

• =

Page 14: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE PROBLEM1• We can naturally relax this requirement to the one where

rather than

• We call this Relaxed Graph Anonymization

Page 15: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE APPROACH• 1- Starting form . we Construct a new degree sequence which is -

anonymous and minimizes the cost.

• 2- starting from the we construct a the graph (,) such as and

Page 16: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

PROBLEM 2 FROM STEP 1 (DEGREE ANONYMIZATION) • Input is (the degree sequence of the graph and an integer

k.

• Output is -anonymous sequence such that is minimized.

Page 17: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

PROBLEM 3 FROM STEP 2 (GRAPH CONSTRUCTION) • The inputs are and a -anonymous degree sequence

• The output is graph (,) such as and or for the relaxed version.

Page 18: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

DEGREE ANONYMIZATION• We can construct a set of dynamic- programming equations

that solve the Graph Anonymization problem. That is

• The running time in o()

Page 19: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

DEGREE ANONYMIZATION• We can improve the running time of the DP algorithm from

O() to O(nk).

Page 20: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

DEGREE ANONYMIZATION• For completeness, we also give a Greedy linear-time

alternative algorithm for the Degree Anonymization problem.

• The Greedy algorithm first forms a group consisting of the first k highest-degree nodes. Then it checks whether it should merge the (k+1)th node into the previously formed group or start a new group at position (k + 1).

• For taking this decision the algorithm computes the following two costs:

Page 21: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

GRAPH CONSTRUCTION• Input are and the desired k-anonymous degree sequence

(which is the output of DP or Greedy algorithms)

• Output is a k-degree anonymous graph (,) such as and

Page 22: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

REALIZABILITY OF DEGREE SEQUENCE• A degree sequence d, with d(1) ≥, .., .., .., ≥ d(n) is called

realizable if and only if there exists a simple graph whose nodes have precisely this sequence of degrees.

• A degree sequence d with d(1) ≥ d(2) ≥… ≥ d(i) ≥… ≥ d(n) and Σd(i) even, is realizable if and only if

“Lemma 1”

1 1

( ) ( 1) min{ , ( )}, for every 1 1.l n

i i l

i l l l i l n

d d

Page 23: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE CONSTRUCTGRAPH ALGORITHM:

Takes as input the desired degree

sequence d and outputs a graph

with exactly this degree sequence,

if such graph exists.

Otherwise it outputs a “No" if

such graph does not exist

Page 24: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

REALIZABILITY OF DEGREE SEQUENCES WITH CONSTRAINTS• Notice that Lemma 1 is not directly applicable to the Graph

Construction problem.

• Because, we also require that .

we want here to devise an algorithm for constructing a degree-anonymous graph Ĝ which is a supergraph of G, if such a graph exists. We call this algorithm the Supergraph, which is an extension ofthe ConstructGraph algorithm.

Page 25: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE PROBING SCHEME• If the Supergraph algorithm returns graph Ĝ, then we

guarantee that the least number of edge additions has been made.

• If Supergraph returns “No” or “Unknown”, we are content in tolerating some more edge-additions in order to get the Probing scheme that forces the Supergraph algorithm to output the desired k-degree anonymous graph with a little extra cost.

Page 26: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

THE PROBING SCHEME

Page 27: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

RELAXED GRAPH CONSTRUCTIONMost of the edges of the original graph appear in the degree-anonymous graph as well, but not necessarily all of them.

Page 28: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

RELAXED GRAPH CONSTRUCTION• The Greedy_Swap algorithm

• is a greedy heuristic that given Ĝ0 and G, it transforms Ĝ0 into Ĝ (V, Ê) with degree sequence dĜ= = dĜ0 and Ê ∩ E ≈ E

• Where is the output of constractGraph algorithm Although it is k-degree anonymous but its structure may be quite diferent from the original graph G(V;E)

Page 29: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

RELAXED GRAPH CONSTRUCTION• The Greedy_Swap algorithm

Page 30: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

RELAXED GRAPH CONSTRUCTION• The Priority algorithm

• a simple modification of the ConstructGraph algorithm that allows the construction of degree anonymous graphs with similar high edge intersection with the original graph directly, without using Greedy_Swap

• it gives priority to already existing edges in the input graph G(V;E).

Page 31: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

RELAXED GRAPH CONSTRUCTION

Page 32: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

EXPERIMENTS

Page 33: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

EVALUATING DEGREE ANONYMIZATION ALGORITHMS• The closer R is to 1, the better the performance of the

Greedy algorithm

Page 34: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

EVALUATING GRAPH CONSTRUCTION ALGORITHMS

Evaluating Anonymization cost L1(dA - d)

The smaller the value of L1(dA - d) the better the

qualitative performance of the algorithm.

Page 35: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

EVALUATING GRAPH CONSTRUCTION ALGORITHMS

Clustering Coefficient (CC):

We additionally compare the clustering

coefficients of the anonymized graphs

with the clustering coefficients of the

original graphs.

Page 36: TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION

EVALUATING GRAPH CONSTRUCTION ALGORITHMS

Average Path Length (APL):