k-anonymity model

8/3/2019 k-Anonymity model

1/28

Presented by

Anubhav,Saurav,Ravi,Ashutosh (ASRA Group)

CSE/2k7

Guided by

Prof. Binod Kumar

13/07/2011ASRA Group 1


2/28

1. Introduction

2. Motivation

3. Achieving Anonymity via Clustering

4. Proposed algorithm5. Experimental result

6. Conclusion

7. Future Work

13/07/2011ASRA Group 2


3/28

Data holders, Statistics Offices are

facing tremendous demand forPerson specific data for theapplication such as :-

Data mining

Cost analysis

Fraud detection

13/07/2011

ASRA Group

3


4/28

13/07/2011

ASRA Group

4

How can a data holder release aversion of its private data with

scientific guarantees that theindividuals who are the subjects of the

data cant be re-identified while the

data remains practically useful forsurvey work.


5/28

13/07/2011

ASRA Group

5

k-Anonymity Model


6/28

13/07/2011

ASRA Group

6

Uniquelyidentify

you!

Sensitive

Zipcode Age Gender Disease

75275 22 Male Flu

75277 23 Male Cold75278 24 Male Diabetes

75275 33 Male Flu

75275 38 Female Arthritis

75275 36 FemaleHeart

problem

Quasi-identifiers:approximate foreign keys


7/28

Identifying Sensitive

Mobile number Name Zipcode Gender age Disease

9905150112 Amit 75275 Male 22 Flu

9905121223 John 75277 Male 23 Cold

9431103097 Rajan 75278 Male 24 Diabetes

9334292352 Robin 75275 Male 33 Flu

9431109087 Ramesh75275

Female 38 Arthritis

9421345678 Dhoni 75275 Female 36 Arthritis

13/07/2011

ASRA Group

7



8/2813/07/2011

ASRA Group

8

Sensitive

Age Gender Zip code Disease

22 Male 75275 Flu

23 Male 75277 Cold

24 Male 75278 Diabetes

33 Male 75275 Flu

38 Female 75275 Arthritis

36 Female 75275 Heart problem



9/2813/07/2011

ASRA Group

9

Zip Code Gender Age Disease Expense

75277 Male 22 Flu 100

75277 Male 23 Cancer 3000

75277 Male 24 HIV+ 5000

75275 Male 33 Diabetes 2500

75275 Female 38 Diabetes 2800

75275 Female 36 Diabetes 2600



10/2813/07/2011

ASRA Group

10


7527* Person [21-30] Flu 100

7527* Person [21-30] Cancer 3000

7527* Person [21-30] HIV+ 5000

7527* Person [31-40] Diabetes 2500




11/2813/07/2011ASRA Group 11


7527* Male [21-25] Flu 100

7527* Male [21-25] Cancer 3000

7527* Male [21-25] HIV+ 5000

75275 Person [31-40] Diabetes 2500




12/2813/07/2011ASRA Group 12

Zipcode Gender Age Disease83100* Person [25-30] Flu

82530* Person [10-15] Obesity

83400* Person [30-35] Cancer

83100* Person [25-30] HIV+82530* Person [15-20] Cancer

83400* Person [30-35] Diabetes

82530* Person [25-30] Obesity

83100* Person [25-30] Flu83400* Person [30-35] Flu


13/28

13/07/2011ASRA Group 13

How to decide number of cluster?


14/28

13/07/2011ASRA Group 14

Distance between two numerical values


15/28

13/07/2011ASRA Group 15

Di b C i l l


16/28

13/07/2011ASRA Group 16

Country

America Asia

North South East West

USA Canada Brazil Mexico IndiaEgyptIran Pakistan

C ( v i, v j)=H(( v i , v j ))/H(TD)

Distance between two Categorical values

Fig : Taxonomy Tree of Country


17/28

13/07/2011ASRA Group 17

Function greedy_k_member_clustering (S, k)If ( |S| k)Return S;End if;Result =; r = a randomly picked from S;While ( |S| k)r= the furthest record from r;S=S-{r};C ={r};While ( |C| < k)r= find_best_record(S,C);

S=S-{r};C=C U {r};End while;Result =Result U {C};End while;While ( |S| 0)r= a randomly picked record from S;S=S-{r};C=find_best_cluster(Result, r);C=C U {r};End while;


18/28

13/07/2011ASRA Group 18

Function find_best_record (S, c)Input: a set of records S and a cluster cOutput: a record r S such that IL(c U {r}) is minimaln= |S|; min=; best = null;for(i=1..n)r= i-th record in S;diff= IL(c U {r}) IL(c);If(diff


19/28

13/07/2011ASRA Group 19

Function find_best_cluster (C, r)Input: a set of clusters C and a record r.Output: a cluster c C such that IL(c {r} is minimaln=|C|; min=; best=null;for( i=1..n)c=i-th cluster in C;diff=IL(CU{r}) IL(C);if(diff


20/28

13/07/2011ASRA Group 20


21/28

13/07/2011ASRA Group 21

The time complexity of this algorithmis

O ((n2 log (n))/c), where c is the average

number of records in each cluster.

The time complexity of this algorithm isbetter than greedy k-member algorithm


22/28

13/07/2011ASRA Group 22

It is difficult to decide a propervalue for the user-defined threshold

This algorithm might delete manyrecords, which in turn cause a

significant information loss.

This algorithm is less sensitive to

outliers


23/28

The main goal of the experimentswas to investigate theimplementation of the k-anonymity

model using clustering algorithm.We mainly focus on the data quality,k-anonymization and scalability

which are main consideration of k-anonymity model

13/07/2011ASRA Group 23


24/28

13/07/2011ASRA Group 24


25/28

Finally, keeping in mind data qualityis the big problem in k-anonymization. We also focus ondata quality rather than thecomputation efficiency that shouldbe the main consideration in k-anonymity model, so we areencouraged by our result which

demonstrates that our algorithm isflexible and is able to produce arange of desired anonymization.

13/07/2011ASRA Group 25


26/28

Encouraged by experimentalresult, we are currently workingon more efficient heuristics to

improve the performance of ourapproach.

We are also working to utilize this

clustering algorithm to detectfraud.

13/07/2011ASRA Group 26


27/28

1. Sweeney, L.: k-Anonymity: A Modelfor Protecting Privacy. International

Journal of Uncertainty, Fuzziness andKnowlege-Based Systems 10, 557570(2002)

2. Efficient k-Anonymization using

clustering techniques, Ji-Wyun, R.Kotagiriet al. (Eds.):DASFAA 2007,LNCS 4443,pp. 188-2007.

13/07/2011ASRA Group 27


28/28

13/07/2011ASRA G 28

k-anonymity model

Documents