towards identity anonymization on graphs. introduction

Post on 24-Dec-2015

228 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TOWARDS IDENTITY ANONYMIZATION ON GRAPHS

INTRODUCTION• Removing the identities of the nodes before publishing the

graph/social network data does not guarantee privacy.

• The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals.

• We call a graph -degree anonymous if for every node , there exist at least other nodes in the graph with the same degree as .

MOTIVATION• Social networks, online communities, peer-to-peer file

sharing and telecommunication systems can be modelled as complex graphs.

• These graphs are of significant importance in various application domains such as marketing, psychology and homeland security.

MOTIVATION• in a social network, nodes correspond to individuals or other

social entities, and edges correspond to social relationships between them.

• http://www.yasiv.com/facebook

• https://apps.facebook.com/touchgraph

THE CHALLENGE

How to minimally modify the graph to protect the identity of each individual involved without losing the information ?

THE PRIVACY BREACHES INSOCIAL NETWORK CATEGORIES• identity disclosure:

the identity of the individual who is associated with the node is revealed

• link disclosure:sensitive relationships between two individuals are disclosed

• content disclosure:the privacy of the data associated with each node is breached e.g., the email message sent and/or received by the individuals in a email communication graph

NOTES • -anonymization is used for content disclosure.

• In this paper, we focus on identity disclosure.

PROBLEM• Given a graph and an integer , modify via a set of edge-

addition (or deletion) operations in order to construct a new -degree anonymous graph, in which every node has the same degree with at least other nodes.

CHALLENGE

we want to preserve the utility of the original graph, while at the same time satisfy the degree-anonymity constraint

PROBLEM DEFINITION• The graph

• the degree sequence of ; vector of size

• is the degree of the node of

• entries in d are ordered in decreasing order of the degrees they correspond to, that is,

• is subsequence of that contains elements

PROBLEM DEFINITION• Definition 1

A vector of integers is -anonymous, if every distinct value in v appears at least times.

vector v = [5; 5; 3; 3; 2; 2; 2]

2-anonymous

• Definition 2

A graph is -degree anonymous if the degree sequence of , is -anonymous.

• This property prevents the reidentification of individuals by adversaries with a priori knowledge of the degree of certain nodes.

PROBLEM DEFINITION

THE PROBLEM1• The input and an integer .

• The output -degree anonymous graph ( ,)

• we restrict the graph modification operations to edge additions

• The graph-anonymization cost should be minimized (minimizing the distance between and

• =

THE PROBLEM1• We can naturally relax this requirement to the one where

rather than

• We call this Relaxed Graph Anonymization

THE APPROACH• 1- Starting form . we Construct a new degree sequence which is -

anonymous and minimizes the cost.

• 2- starting from the we construct a the graph (,) such as and

PROBLEM 2 FROM STEP 1 (DEGREE ANONYMIZATION) • Input is (the degree sequence of the graph and an integer

k.

• Output is -anonymous sequence such that is minimized.

PROBLEM 3 FROM STEP 2 (GRAPH CONSTRUCTION) • The inputs are and a -anonymous degree sequence

• The output is graph (,) such as and or for the relaxed version.

DEGREE ANONYMIZATION• We can construct a set of dynamic- programming equations

that solve the Graph Anonymization problem. That is

• The running time in o()

DEGREE ANONYMIZATION• We can improve the running time of the DP algorithm from

O() to O(nk).

DEGREE ANONYMIZATION• For completeness, we also give a Greedy linear-time

alternative algorithm for the Degree Anonymization problem.

• The Greedy algorithm first forms a group consisting of the first k highest-degree nodes. Then it checks whether it should merge the (k+1)th node into the previously formed group or start a new group at position (k + 1).

• For taking this decision the algorithm computes the following two costs:

GRAPH CONSTRUCTION• Input are and the desired k-anonymous degree sequence

(which is the output of DP or Greedy algorithms)

• Output is a k-degree anonymous graph (,) such as and

REALIZABILITY OF DEGREE SEQUENCE• A degree sequence d, with d(1) ≥, .., .., .., ≥ d(n) is called

realizable if and only if there exists a simple graph whose nodes have precisely this sequence of degrees.

• A degree sequence d with d(1) ≥ d(2) ≥… ≥ d(i) ≥… ≥ d(n) and Σd(i) even, is realizable if and only if

“Lemma 1”

1 1

( ) ( 1) min{ , ( )}, for every 1 1.l n

i i l

i l l l i l n

d d

THE CONSTRUCTGRAPH ALGORITHM:

Takes as input the desired degree

sequence d and outputs a graph

with exactly this degree sequence,

if such graph exists.

Otherwise it outputs a “No" if

such graph does not exist

REALIZABILITY OF DEGREE SEQUENCES WITH CONSTRAINTS• Notice that Lemma 1 is not directly applicable to the Graph

Construction problem.

• Because, we also require that .

we want here to devise an algorithm for constructing a degree-anonymous graph Ĝ which is a supergraph of G, if such a graph exists. We call this algorithm the Supergraph, which is an extension ofthe ConstructGraph algorithm.

THE PROBING SCHEME• If the Supergraph algorithm returns graph Ĝ, then we

guarantee that the least number of edge additions has been made.

• If Supergraph returns “No” or “Unknown”, we are content in tolerating some more edge-additions in order to get the Probing scheme that forces the Supergraph algorithm to output the desired k-degree anonymous graph with a little extra cost.

THE PROBING SCHEME

RELAXED GRAPH CONSTRUCTIONMost of the edges of the original graph appear in the degree-anonymous graph as well, but not necessarily all of them.

RELAXED GRAPH CONSTRUCTION• The Greedy_Swap algorithm

• is a greedy heuristic that given Ĝ0 and G, it transforms Ĝ0 into Ĝ (V, Ê) with degree sequence dĜ= = dĜ0 and Ê ∩ E ≈ E

• Where is the output of constractGraph algorithm Although it is k-degree anonymous but its structure may be quite diferent from the original graph G(V;E)

RELAXED GRAPH CONSTRUCTION• The Greedy_Swap algorithm

RELAXED GRAPH CONSTRUCTION• The Priority algorithm

• a simple modification of the ConstructGraph algorithm that allows the construction of degree anonymous graphs with similar high edge intersection with the original graph directly, without using Greedy_Swap

• it gives priority to already existing edges in the input graph G(V;E).

RELAXED GRAPH CONSTRUCTION

EXPERIMENTS

EVALUATING DEGREE ANONYMIZATION ALGORITHMS• The closer R is to 1, the better the performance of the

Greedy algorithm

EVALUATING GRAPH CONSTRUCTION ALGORITHMS

Evaluating Anonymization cost L1(dA - d)

The smaller the value of L1(dA - d) the better the

qualitative performance of the algorithm.

EVALUATING GRAPH CONSTRUCTION ALGORITHMS

Clustering Coefficient (CC):

We additionally compare the clustering

coefficients of the anonymized graphs

with the clustering coefficients of the

original graphs.

EVALUATING GRAPH CONSTRUCTION ALGORITHMS

Average Path Length (APL):

top related