towards identity anonymization on graphs. introduction
TRANSCRIPT
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS
INTRODUCTION• Removing the identities of the nodes before publishing the
graph/social network data does not guarantee privacy.
• The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals.
• We call a graph -degree anonymous if for every node , there exist at least other nodes in the graph with the same degree as .
MOTIVATION• Social networks, online communities, peer-to-peer file
sharing and telecommunication systems can be modelled as complex graphs.
• These graphs are of significant importance in various application domains such as marketing, psychology and homeland security.
MOTIVATION• in a social network, nodes correspond to individuals or other
social entities, and edges correspond to social relationships between them.
• http://www.yasiv.com/facebook
• https://apps.facebook.com/touchgraph
THE CHALLENGE
How to minimally modify the graph to protect the identity of each individual involved without losing the information ?
THE PRIVACY BREACHES INSOCIAL NETWORK CATEGORIES• identity disclosure:
the identity of the individual who is associated with the node is revealed
• link disclosure:sensitive relationships between two individuals are disclosed
• content disclosure:the privacy of the data associated with each node is breached e.g., the email message sent and/or received by the individuals in a email communication graph
NOTES • -anonymization is used for content disclosure.
• In this paper, we focus on identity disclosure.
PROBLEM• Given a graph and an integer , modify via a set of edge-
addition (or deletion) operations in order to construct a new -degree anonymous graph, in which every node has the same degree with at least other nodes.
CHALLENGE
we want to preserve the utility of the original graph, while at the same time satisfy the degree-anonymity constraint
PROBLEM DEFINITION• The graph
• the degree sequence of ; vector of size
• is the degree of the node of
• entries in d are ordered in decreasing order of the degrees they correspond to, that is,
• is subsequence of that contains elements
PROBLEM DEFINITION• Definition 1
A vector of integers is -anonymous, if every distinct value in v appears at least times.
vector v = [5; 5; 3; 3; 2; 2; 2]
2-anonymous
• Definition 2
A graph is -degree anonymous if the degree sequence of , is -anonymous.
• This property prevents the reidentification of individuals by adversaries with a priori knowledge of the degree of certain nodes.
PROBLEM DEFINITION
THE PROBLEM1• The input and an integer .
• The output -degree anonymous graph ( ,)
• we restrict the graph modification operations to edge additions
• The graph-anonymization cost should be minimized (minimizing the distance between and
• =
THE PROBLEM1• We can naturally relax this requirement to the one where
rather than
• We call this Relaxed Graph Anonymization
THE APPROACH• 1- Starting form . we Construct a new degree sequence which is -
anonymous and minimizes the cost.
• 2- starting from the we construct a the graph (,) such as and
PROBLEM 2 FROM STEP 1 (DEGREE ANONYMIZATION) • Input is (the degree sequence of the graph and an integer
k.
• Output is -anonymous sequence such that is minimized.
PROBLEM 3 FROM STEP 2 (GRAPH CONSTRUCTION) • The inputs are and a -anonymous degree sequence
• The output is graph (,) such as and or for the relaxed version.
DEGREE ANONYMIZATION• We can construct a set of dynamic- programming equations
that solve the Graph Anonymization problem. That is
• The running time in o()
DEGREE ANONYMIZATION• We can improve the running time of the DP algorithm from
O() to O(nk).
DEGREE ANONYMIZATION• For completeness, we also give a Greedy linear-time
alternative algorithm for the Degree Anonymization problem.
• The Greedy algorithm first forms a group consisting of the first k highest-degree nodes. Then it checks whether it should merge the (k+1)th node into the previously formed group or start a new group at position (k + 1).
• For taking this decision the algorithm computes the following two costs:
GRAPH CONSTRUCTION• Input are and the desired k-anonymous degree sequence
(which is the output of DP or Greedy algorithms)
• Output is a k-degree anonymous graph (,) such as and
REALIZABILITY OF DEGREE SEQUENCE• A degree sequence d, with d(1) ≥, .., .., .., ≥ d(n) is called
realizable if and only if there exists a simple graph whose nodes have precisely this sequence of degrees.
• A degree sequence d with d(1) ≥ d(2) ≥… ≥ d(i) ≥… ≥ d(n) and Σd(i) even, is realizable if and only if
“Lemma 1”
1 1
( ) ( 1) min{ , ( )}, for every 1 1.l n
i i l
i l l l i l n
d d
THE CONSTRUCTGRAPH ALGORITHM:
Takes as input the desired degree
sequence d and outputs a graph
with exactly this degree sequence,
if such graph exists.
Otherwise it outputs a “No" if
such graph does not exist
REALIZABILITY OF DEGREE SEQUENCES WITH CONSTRAINTS• Notice that Lemma 1 is not directly applicable to the Graph
Construction problem.
• Because, we also require that .
we want here to devise an algorithm for constructing a degree-anonymous graph Ĝ which is a supergraph of G, if such a graph exists. We call this algorithm the Supergraph, which is an extension ofthe ConstructGraph algorithm.
THE PROBING SCHEME• If the Supergraph algorithm returns graph Ĝ, then we
guarantee that the least number of edge additions has been made.
• If Supergraph returns “No” or “Unknown”, we are content in tolerating some more edge-additions in order to get the Probing scheme that forces the Supergraph algorithm to output the desired k-degree anonymous graph with a little extra cost.
THE PROBING SCHEME
RELAXED GRAPH CONSTRUCTIONMost of the edges of the original graph appear in the degree-anonymous graph as well, but not necessarily all of them.
RELAXED GRAPH CONSTRUCTION• The Greedy_Swap algorithm
• is a greedy heuristic that given Ĝ0 and G, it transforms Ĝ0 into Ĝ (V, Ê) with degree sequence dĜ= = dĜ0 and Ê ∩ E ≈ E
• Where is the output of constractGraph algorithm Although it is k-degree anonymous but its structure may be quite diferent from the original graph G(V;E)
RELAXED GRAPH CONSTRUCTION• The Greedy_Swap algorithm
RELAXED GRAPH CONSTRUCTION• The Priority algorithm
• a simple modification of the ConstructGraph algorithm that allows the construction of degree anonymous graphs with similar high edge intersection with the original graph directly, without using Greedy_Swap
• it gives priority to already existing edges in the input graph G(V;E).
RELAXED GRAPH CONSTRUCTION
EXPERIMENTS
EVALUATING DEGREE ANONYMIZATION ALGORITHMS• The closer R is to 1, the better the performance of the
Greedy algorithm
EVALUATING GRAPH CONSTRUCTION ALGORITHMS
Evaluating Anonymization cost L1(dA - d)
The smaller the value of L1(dA - d) the better the
qualitative performance of the algorithm.
EVALUATING GRAPH CONSTRUCTION ALGORITHMS
Clustering Coefficient (CC):
We additionally compare the clustering
coefficients of the anonymized graphs
with the clustering coefficients of the
original graphs.
EVALUATING GRAPH CONSTRUCTION ALGORITHMS
Average Path Length (APL):