xiaowei ying xintao wu univ. of north carolina at charlotte 2009 siam conference on data mining, may...
TRANSCRIPT
Xiaowei Ying Xintao Wu
Univ. of North Carolina at Charlotte
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range
constraint-- Privacy risks introduced by feature constraints
• Generator with feature distribution
constraint
Framework
Graph Generation with Prescribed Feature Constraints
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 2
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Publishing social networks: Privacy VS. Utility
Privacy issue: anonymization is not enoughActive/passive attacks[Backstrom, et. al., WWW07]Subgraph attacks [M. Hay et. al., VLDB08]
K-anonymity in social networks[B. Zhou, et. al. ICDE08] [K. Liu et. al., SIGMOD08]
Randomization approachLocal topology is changed – reduce re-identification riskLinks are randomized – link privacy is pretected
Motivation
3
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Publishing social networks: Privacy VS. UtilityRandomization Approach-- Pure randomization can’t preserve many topological
features. [Ying SDM08]
Motivation
4
C
h2
1
-- the largest eigenvalue of adjacency
matrix
-- the second smallest eigenvalue of Laplacian matrix
-- harmonic mean of shortest distance
-- transitivity
How to generate graphs preserving data utility?
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Generate graphs for testing data
mining results-- Generate a set of graph samples s.t. a feature
of the samples satisfies a specified distribution.
Motivation
5
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range
constraint-- Privacy risks introduced by feature constraints
• Generator with feature distribution
constraint
Framework
Graph Generation with Prescribed Feature Constraints
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 6
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
1. Accessibility: can access all the graph with the given degree sequence
2. Uniformity: all such graphs have the same probability to be generated
3. Application: empirically learning the property of graph features given degree seq.
Switch and Uniform Graph Generator
7
Uniform switch procedure [Taylor, 1981]
-- Preserves the degree sequence/distribution
1
2 3
4
5
1
2 3
4
5
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
How to generate a graph:1. with the given degree sequence2. with the feature range constraint (FRC):
8
RGSssRGS )~
(],[)(
Graph Generator with FRC
uniformity
for accessible
graphs
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range
constraint-- Privacy risks introduced by feature constraints
• Generator with feature distribution
constraint
Framework
Graph Generation with Prescribed Feature Constraints
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 9
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Privacy risks introduced by FRCAttackers know: 1.The released graph preserve the true degree
sequence2.The true graph has its S feature within range R
What attackers can do?
Graph Generator and Privacy Issues
10
With the released graph, attackers can explore the graph space
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph space :{G: with the given degree seq. &
}
Uniformly sample the graph space:
11
N
kkij
N
jiGN
SpaceaP
SpaceGGGN
1
21
),(1
)|1(
,,, :samples
Graph Generator and Privacy Issues
RGS )(
Attacker’s confidence on link (i,j)
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
:
:
:
:
2
1
C
h
Network of US political
books
(105 nodes, 441 edges)
Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative".
http://www-personal.umich.edu/˜mejn/netdata/
12
FRC Can Jeopardize Privacy--A real network example
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
13
The attacker simply takes t node pairs with the highest probabilities as candidate links
Polbook network105 nodes, 441 edges
FRC Can Jeopardize Privacy--A real network example
Some features jeopardize privacy, and some others not
Top candidates can seriously jeopardize privacy!!
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
14
FRC Can Jeopardize Privacy-- More real network examples
Polbook network105 nodes, 441
edges
Enron email network151 nodes, 869
edges
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
15
FRC Can Jeopardize Privacy-- A theoretical result
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
16
FRC Can Jeopardize Privacy-- A theoretical result
Conclusion:If the FRC specifies a sub-space close to the true graph, privacy is seriously breached
Graphs with given degree seq.
True graph
Feature range constraint
d
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range
constraint-- Privacy risks introduced by feature constraints
• Generator with feature distribution
constraint
Framework
Graph Generation with Prescribed Feature Constraints
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 17
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator with FDCFeature Distribution Constraint (FDC)
18
Uniform generator:
•gives the natural distribution of feature S, highly skewed in the range
How to generate graphs s.t.
•with given degree seq.
•features value has the target distribution g(x)
Natural distribution f(x)
Target distribution g(x)
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Based on Metropolis-Hastings method
Accept ratio depends on target distr. g(x) & natural distr. f(x)
19
Graph Generator with FDC
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
20
Graph Generator with FDC
Target distribution:
Natural distribution:
Evaluation
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Summary
Graph generator with feature range constraint
Attackers can sample the graph space near the true graph and breach the privacy.
Graph generator with feature distribution constraint
Generate a set of graphs samples for statistical testing
21
Questions?
Acknowledgments
This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.
Thank You!
22
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
23
Example: graphs with degree sequence {3,2,2,2,3}.
Is node 1 and 5 connected?
Graph Generator and Privacy Issues
True graph
Published graph
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator with FDCProblem of generator with FRC:
24
Uniform generator:
•gives the natural distribution of feature S
•highly skewed in the range
•generates biased feature value
Real-world graph
Range