xiaowei ying xintao wu univ. of north carolina at charlotte 2009 siam conference on data mining, may...

24
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed Feature Constraints

Upload: curtis-franklin

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Xiaowei Ying Xintao Wu

Univ. of North Carolina at Charlotte

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

• Motivation

-- Generate graphs for publishing social network

-- Generate graphs for testing data mining results

• Generator with feature range

constraint-- Privacy risks introduced by feature constraints

• Generator with feature distribution

constraint

Framework

Graph Generation with Prescribed Feature Constraints

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 2

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Publishing social networks: Privacy VS. Utility

Privacy issue: anonymization is not enoughActive/passive attacks[Backstrom, et. al., WWW07]Subgraph attacks [M. Hay et. al., VLDB08]

K-anonymity in social networks[B. Zhou, et. al. ICDE08] [K. Liu et. al., SIGMOD08]

Randomization approachLocal topology is changed – reduce re-identification riskLinks are randomized – link privacy is pretected

Motivation

3

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Publishing social networks: Privacy VS. UtilityRandomization Approach-- Pure randomization can’t preserve many topological

features. [Ying SDM08]

Motivation

4

C

h2

1

-- the largest eigenvalue of adjacency

matrix

-- the second smallest eigenvalue of Laplacian matrix

-- harmonic mean of shortest distance

-- transitivity

How to generate graphs preserving data utility?

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Generate graphs for testing data

mining results-- Generate a set of graph samples s.t. a feature

of the samples satisfies a specified distribution.

Motivation

5

• Motivation

-- Generate graphs for publishing social network

-- Generate graphs for testing data mining results

• Generator with feature range

constraint-- Privacy risks introduced by feature constraints

• Generator with feature distribution

constraint

Framework

Graph Generation with Prescribed Feature Constraints

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 6

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

1. Accessibility: can access all the graph with the given degree sequence

2. Uniformity: all such graphs have the same probability to be generated

3. Application: empirically learning the property of graph features given degree seq.

Switch and Uniform Graph Generator

7

Uniform switch procedure [Taylor, 1981]

-- Preserves the degree sequence/distribution

1

2 3

4

5

1

2 3

4

5

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

How to generate a graph:1. with the given degree sequence2. with the feature range constraint (FRC):

8

RGSssRGS )~

(],[)(

Graph Generator with FRC

uniformity

for accessible

graphs

• Motivation

-- Generate graphs for publishing social network

-- Generate graphs for testing data mining results

• Generator with feature range

constraint-- Privacy risks introduced by feature constraints

• Generator with feature distribution

constraint

Framework

Graph Generation with Prescribed Feature Constraints

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 9

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Privacy risks introduced by FRCAttackers know: 1.The released graph preserve the true degree

sequence2.The true graph has its S feature within range R

What attackers can do?

Graph Generator and Privacy Issues

10

With the released graph, attackers can explore the graph space

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Graph space :{G: with the given degree seq. &

}

Uniformly sample the graph space:

11

N

kkij

N

jiGN

SpaceaP

SpaceGGGN

1

21

),(1

)|1(

,,, :samples

Graph Generator and Privacy Issues

RGS )(

Attacker’s confidence on link (i,j)

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

:

:

:

:

2

1

C

h

Network of US political

books

(105 nodes, 441 edges)

Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative".

http://www-personal.umich.edu/˜mejn/netdata/

12

FRC Can Jeopardize Privacy--A real network example

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

13

The attacker simply takes t node pairs with the highest probabilities as candidate links

Polbook network105 nodes, 441 edges

FRC Can Jeopardize Privacy--A real network example

Some features jeopardize privacy, and some others not

Top candidates can seriously jeopardize privacy!!

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

14

FRC Can Jeopardize Privacy-- More real network examples

Polbook network105 nodes, 441

edges

Enron email network151 nodes, 869

edges

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

15

FRC Can Jeopardize Privacy-- A theoretical result

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

16

FRC Can Jeopardize Privacy-- A theoretical result

Conclusion:If the FRC specifies a sub-space close to the true graph, privacy is seriously breached

Graphs with given degree seq.

True graph

Feature range constraint

d

• Motivation

-- Generate graphs for publishing social network

-- Generate graphs for testing data mining results

• Generator with feature range

constraint-- Privacy risks introduced by feature constraints

• Generator with feature distribution

constraint

Framework

Graph Generation with Prescribed Feature Constraints

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada 17

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Graph Generator with FDCFeature Distribution Constraint (FDC)

18

Uniform generator:

•gives the natural distribution of feature S, highly skewed in the range

How to generate graphs s.t.

•with given degree seq.

•features value has the target distribution g(x)

Natural distribution f(x)

Target distribution g(x)

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Based on Metropolis-Hastings method

Accept ratio depends on target distr. g(x) & natural distr. f(x)

19

Graph Generator with FDC

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

20

Graph Generator with FDC

Target distribution:

Natural distribution:

Evaluation

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Summary

Graph generator with feature range constraint

Attackers can sample the graph space near the true graph and breach the privacy.

Graph generator with feature distribution constraint

Generate a set of graphs samples for statistical testing

21

Questions?

Acknowledgments

This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.

Thank You!

22

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

23

Example: graphs with degree sequence {3,2,2,2,3}.

Is node 1 and 5 connected?

Graph Generator and Privacy Issues

True graph

Published graph

2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

Graph Generation with Prescribed Feature Constraints

Graph Generator with FDCProblem of generator with FRC:

24

Uniform generator:

•gives the natural distribution of feature S

•highly skewed in the range

•generates biased feature value

Real-world graph

Range