p rivacy -p reserving r eleases of s ocial n etworks chih-hua tai dept. of csie, national taipei...

51
PRIVACY-PRESERVING RELEASES OF SOCIAL NETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

Upload: debra-griffith

Post on 26-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PRIVACY-PRESERVING RELEASES OF SOCIAL NETWORKS

Chih-Hua Tai

Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

Page 2: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

DATA MINING

The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules

2

Page 3: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

FINDING PATTERNS

3

Page 4: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

FINDING RULES

4

Page 5: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

DATA MINING VS. PRIVACY

The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules

Can we develop accurate models without access to precise information in individual data records? Why?

5

Page 6: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PRIVACY ISSUES IN DATA

6

Page 7: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PRIVACY ISSUES IN DATA

7

Page 8: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PRIVACY ISSUES IN DATA

8

Page 9: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

DATA MINING AND PRIVACY

The primary task in data mining: development of models about aggregated data.

Can we develop accurate models without access to precise information in individual data records?

Answer: yes, by randomization. R. Agrawal, R. Srikant “Privacy Preserving Data

Mining,” SIGMOD 2000 How about the data utility?

9

Page 10: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

DATA MINING VS. SOCIAL NETWORKS

10

Attributes: Name, Salary, …

Links: Friends, Neighborhood, …

Communities: Interests, Activities, …

Page 11: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PRIVACY ISSUES ON SOCIAL NETWORKS

Personal information leaked, even if the vertex identifies are hidden…

11

Many information can be used to re-associate the vertex with its identity.

Vertex degree: k-degree anonymity , …

Neighborhood configuration: k-neighborhood anonymity, k-automorphism anonymity, k-isomorphism anonymity, grouping-and- collapsing, …

Page 12: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

13

C.-H. Tai, P. S. Yu, D.-N. Yang and M.-S. Chen, "Privacy-preserving Social Network Publication Against Friendship Attacks," In KDD, 2011.

Page 13: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

FRIENDSHIP ATTACK

Still there are another type of information for vertex re-identification – friendship attack

14

Page 14: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

FRIENDSHIP ATTACK

Given a target individual A and the degree pair information D2 = (d1,d2), a friendship attack (D2,A) exploits D2 to identify a vertex v1 corresponding to A in a published social network where v1 connects to another vertex v2 with the degree pair (dv1

,dv2) = (d1,d2).

15

10

1 2 3

4 5

Alice

6 7

9

8

Ex 1. Assume that an attacker knows that Alice has 3 connections, Bob has 2 connections, and Alice and Bob are friends. The attacker identifies v9 as Alice with 100% confidence.

Page 15: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

FRIENDSHIP ATTACK

16

In DBLP data set, the percentages of vertices that can be re-identified with a probability larger than 1/k by degree and friendship attacks.

Original Social Network

k-degree anonymized Social Network

kDegree Attack Friendship Attack Friendship Attack

5 0.28% 5.37% 2.89%

10 0.53% 10.69% 4.65%

15 0.73% 14.71% 5.82%

20 0.93% 18.44% 7.23%

Page 16: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

NEW PRIVACY MODEL AGAINST FRIENDSHIP ATTACK

k2-degree AnonymityA social network is k2-degree anonymous

if, for every vertex with an incident edge of degree pair (d1,d2), there exist at least k − 1 other vertices, each of which also has an incident edge of the same degree pair.

17

10

1 2 3

4 5

6 7

9

8

Ex 2. Even with the knowledge (D2,A)=((3,2),Alice), the probability that an attacker can re-identify Alice in the 22-degree anonymous social network is limited to ½.

Page 17: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THE ANONYMIZATION

Problem formulation: Given a graph G(V, E) and an integer k, the problem is to

anonymize G to satisfy k2-degree anonymity such that information distortion is minimized.

The challenges: Any alteration on an edge will affect the degrees of

two vertices.

18

Page 18: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

GRAPH ANONYMIZATION ALGORITHMS

Integer Programming formulation Obtain the optimal solution with bad scalability

DEgree SEqence ANonymization (DESEAN) Step1. Degree Sequence Anonymization.

- determine the groups of vertices protecting each others

Step2. Privacy Constraint Satisfaction.- eliminate the advantage of knowing friendship information

Step3. Anonymous Degree Realization.- have the vertices in the same group share the same vertex degree 19

Page 19: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a

target degree dx for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as small as possible.

ALGORITHM DESEAN

20

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

Page 20: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

ALGORITHM DESEAN

Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a

target degree dx for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as small as possible.

21

10

1 2 3

4 5

6 7

9

8

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

Page 21: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure

that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k.

ALGORITHM DESEAN

22

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

Page 22: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure

that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k.

ALGORITHM DESEAN

23

10

1 2 3

4 5

6 7

9

8

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

Page 23: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each

cluster x meet the target degree dx selected in Step 1.

10

1 2 3

4 5

6 7

9

8

ALGORITHM DESEAN

24

Ex 3. Given k = 2 and ω = 0.5.

Page 24: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each

cluster x meet the target degree dx selected in Step 1.

10

1 2 3

4 5

6 7

9

8

10

1 2 3

4 5

6 7

9

8

ALGORITHM DESEAN

25

Ex 3. Given k = 2 and ω = 0.5.

Page 25: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

26

C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen, ”Structural diversity for privacy in publishing social networks,” In SDM, 2011.

Page 26: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

COMMUNITY IDENTIFICATION

Vertex identification is considered to be an important privacy issue in publishing social networks. ◦ k-degree anonymity, k-neighborhood anonymity, …

In addition to a vertex identity, each individual is also associated with a community identity. ◦ Could be used to infer the political party affiliation or disease

information sensitive to the public.◦ Is a kind of structural information

27

Page 27: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

COMMUNITY IDENTIFICATION

Community information is explicitly given:

Ex.

Alice knows recently… Bob is sick Bob participates in this social network Bob makes 5 friends. (vertex degree attack)

Bob has AIDS!

AIDS Com. SLE Com.

28

Page 28: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

COMMUNITY IDENTIFICATION

Community information is not given:

Ex.

Alice knows Bob participates in this social network and has 5 friends. (vertex degree attack)

Alice can know the approximation of Bob’s neighborhood.

29

Page 29: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

COMMUNITY IDENTIFICATION

% of vertices violating k-structural diversity◦ (a) DBLP◦ (b) ca-CondMat

k-degree anonymization is insufficient 30

Page 30: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

COMMUNITY IDENTIFICATION

structural diversity◦ (a) original DBLP◦ (b) 10-degree anonymized DBLP

Vertices with large degrees appear in a small set of communities 31

Page 31: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

NEW PRIVACY MODEL AGAINST COMMUNITY IDENTIFICATION

k-Structural Diversity To protect against vertex degree attack, for each vertex, there

should be other vertices with the same degree located in at least k-1 other communities.

If a graph satisfies k-structural diversity, then it also satisfies k-degree anonymity.

32

Page 32: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THE ANONYMIZATION

Problem formulation: Given a graph G(V, E, C) and an integer k, 1 k |C|, the ≦ ≦

problem is to anonymize G to satisfy k-structural diversity such that information distortion is minimized.

The challenges: How to preserve community structures, even in

the implicit cases, while preserving privacy.

33

Page 33: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROBLEM FORMULATION

Operation Adding Edge ◦ Connect two vertices belonging to the same community.◦ Can avoid destroying the communities.

34

Page 34: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROCEDURE MERGENCE

◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d

Com. 1

v

Com. 2

Com. 335

Page 35: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROCEDURE MERGENCE

◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d

Com. 1

v

Com. 2

Com. 336

Page 36: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROCEDURE CREATION

To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v

Com. 1 Com. 2 Com. 3

v

37

Page 37: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROCEDURE CREATION

To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v

Com. 1 Com. 2 Com. 3

v

38

Page 38: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THE EDGE-REDITECTION MECHANISM

◦ Is defined on w, v, x in the same community w: an anonymized vertex v and x : two not-yet-anonymized vertices

◦ Is to replace the edge (w, v) with the edge (w, x)

v w

v w

x

39

Page 39: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THE EDGE-REDITECTION MECHANISM

By mergence

v w

Com. 1 Com. 2

Com. 3

Q. Why needs the Edge-Reditection mechanism?

By mergence

v w

Com. 1 Com. 2

Com. 3

40

Page 40: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THE EDGE-REDITECTION MECHANISM

By creation

Com. 1

Com. 2 Com. 3

v w

Q. Why needs the Edge-Reditection mechanism?

By creation

Com. 1

Com. 2 Com. 3

v w

41

Page 41: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

ALGORITHM EDGECONNECT (EC)

Procedures Mergence and Creation◦ Let Rv be the set of edges that could be redirected away from v

The Edge-Reditection mechanism

42

Page 42: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROBLEM FORMULATION

Operation Adding Edge ◦ Connect two vertices belonging to the same community.◦ Can avoid destroying the communities.

Operation Splitting Vertex◦ Replace a vertex v with a set of substitute vertices, such that

each substitute vertex is connected with at least one edge incident to v originally.

◦ Each substitute vertex presents partial truth of the vertex v.

43

Page 43: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROCEDURE CREATIONBYSPLIT

Split a set of vertices, including v, to create a new anonymous group

Com. 1 Com. 2

Com. 3

v

Com. 1 Com. 2

Com. 3

v1v2

44

Page 44: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PROCEDURE MERGEBYSPLIT

Split v into a set of substitute vertices s.t. each substitute vertex is protected in some existing anonymous group

Com. 1 Com. 2

Com. 3 v

v1

v2

v3

45

Page 45: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

46

C.-H. Tai, P.-J. Tseng, P. S. Yu and M.-S. Chen, "Identities Anonymization in Dynamic Social Networks," In ICDM-11, 2011.

Page 46: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THE PROBLEM IN DYNAMIC SCENARIOS…

A dynamic social network will be sequentially released.

An attacker can monitor a victim for a period w.

Therefore, the adversary knowledge includes: The releases G t-w+1,

G t-w+2, …, G t during w A degree sequence Δv

w=(dvt-

w+1, dv

t-w+2, …, dvt) of a victim v

during w47

G2G1

Ex. John has two friends at time 1, and three friends at time 2.

Page 47: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PRIVACY MODEL: KW-STRUCTURAL DIVERSITY ANONYMITY

Base case of w=1 A group θd, consisting of all

vertices of degree d, is a k-shielding group if there is a vertex subset θ ⊆ θd s. t. (1) |θ |≥ k, and (2) any two vertices u and v in θ,

Cv ∩ Cv = ø, where C is the community identity.

48

The adversary knowledge includes: 1. The release

social network G t 2. A degree

sequence Δv

1=(dvt) of a

victim v

Ex. Mary has four friends.

???

G

Page 48: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

PRIVACY MODEL: KW-STRUCTURAL DIVERSITY ANONYMITY

Dynamic scenarios of w>1 A consistent group ΘΔ is the set of vertices

that always share the same degree during w.

A consistent group ΘΔ is a k-shielding if

at each time instant t in w, there is a vertex subset Θ t

⊆ ΘΔ s. t. (1) |Θ t |≥ k, and (2) any two vertices u and v in Θ t, Cv

t ∩ Cv t = ø, where

C t is the community identity at time t.

49

The adversary knowledge of includes: 1. The releases G t-

w+1, G t-w+2, …, G t during w

2. A degree sequence Δv

w=(dvt-w+1, dv

t-

w+2, …, dvt) of a

victim v during w

Page 49: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THE ANONYMIZATION

Problem formulation: Suppose that every vertex in a series of sequential

releases G t-w+1, G t-w+2, …, G t-1 is protected. Given G t-w+1, G t-w+2, …, G t-1 and k, anonymize the

current social network Gt s. t. every vertex is protected in a k-shielding consistent group.

The challenges: The anonymization is depended on not only the

current social network but also previous w-1 releases.

Searching through all the w-1 releases to eliminate privacy leak is time consuming.

50

Page 50: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

ANONYMIZATION ALGORITHM

Construct CS (Clustering Sequence) -Table to prevent the search through w graphs.

CS-Table Summary the vertex information in w-1 previous

releases. Fetch v’ info. in w graphs without scanning the graphs.

According to the degree sequence during w, sort the vertices in hierarchical clustering manner. Vertices in the same k-consistent shielding group are close in CS-Table. CS-Table can be incrementally updated.

According to the vertex ranking in CS-Table, anonymize each vertex to be protected.

51

Page 51: P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

THANK YOU~!