k-anonymity model
TRANSCRIPT
-
8/3/2019 k-Anonymity model
1/28
Presented by
Anubhav,Saurav,Ravi,Ashutosh (ASRA Group)
CSE/2k7
Guided by
Prof. Binod Kumar
13/07/2011ASRA Group 1
-
8/3/2019 k-Anonymity model
2/28
1. Introduction
2. Motivation
3. Achieving Anonymity via Clustering
4. Proposed algorithm5. Experimental result
6. Conclusion
7. Future Work
13/07/2011ASRA Group 2
-
8/3/2019 k-Anonymity model
3/28
Data holders, Statistics Offices are
facing tremendous demand forPerson specific data for theapplication such as :-
Data mining
Cost analysis
Fraud detection
13/07/2011
ASRA Group
3
-
8/3/2019 k-Anonymity model
4/28
13/07/2011
ASRA Group
4
How can a data holder release aversion of its private data with
scientific guarantees that theindividuals who are the subjects of the
data cant be re-identified while the
data remains practically useful forsurvey work.
-
8/3/2019 k-Anonymity model
5/28
13/07/2011
ASRA Group
5
k-Anonymity Model
-
8/3/2019 k-Anonymity model
6/28
13/07/2011
ASRA Group
6
Uniquelyidentify
you!
Sensitive
Zipcode Age Gender Disease
75275 22 Male Flu
75277 23 Male Cold75278 24 Male Diabetes
75275 33 Male Flu
75275 38 Female Arthritis
75275 36 FemaleHeart
problem
Quasi-identifiers:approximate foreign keys
-
8/3/2019 k-Anonymity model
7/28
Identifying Sensitive
Mobile number Name Zipcode Gender age Disease
9905150112 Amit 75275 Male 22 Flu
9905121223 John 75277 Male 23 Cold
9431103097 Rajan 75278 Male 24 Diabetes
9334292352 Robin 75275 Male 33 Flu
9431109087 Ramesh75275
Female 38 Arthritis
9421345678 Dhoni 75275 Female 36 Arthritis
13/07/2011
ASRA Group
7
Quasi-identifiers:approximate foreign keys
-
8/3/2019 k-Anonymity model
8/2813/07/2011
ASRA Group
8
Sensitive
Age Gender Zip code Disease
22 Male 75275 Flu
23 Male 75277 Cold
24 Male 75278 Diabetes
33 Male 75275 Flu
38 Female 75275 Arthritis
36 Female 75275 Heart problem
Quasi-identifiers:approximate foreign keys
-
8/3/2019 k-Anonymity model
9/2813/07/2011
ASRA Group
9
Zip Code Gender Age Disease Expense
75277 Male 22 Flu 100
75277 Male 23 Cancer 3000
75277 Male 24 HIV+ 5000
75275 Male 33 Diabetes 2500
75275 Female 38 Diabetes 2800
75275 Female 36 Diabetes 2600
Quasi-identifiers:approximate foreign keys
-
8/3/2019 k-Anonymity model
10/2813/07/2011
ASRA Group
10
Zip Code Gender Age Disease Expense
7527* Person [21-30] Flu 100
7527* Person [21-30] Cancer 3000
7527* Person [21-30] HIV+ 5000
7527* Person [31-40] Diabetes 2500
7527* Person [31-40] Diabetes 2800
7527* Person [31-40] Diabetes 2600
-
8/3/2019 k-Anonymity model
11/2813/07/2011ASRA Group 11
Zip Code Gender Age Disease Expense
7527* Male [21-25] Flu 100
7527* Male [21-25] Cancer 3000
7527* Male [21-25] HIV+ 5000
75275 Person [31-40] Diabetes 2500
75275 Person [31-40] Diabetes 2800
75275 Person [31-40] Diabetes 2600
-
8/3/2019 k-Anonymity model
12/2813/07/2011ASRA Group 12
Zipcode Gender Age Disease83100* Person [25-30] Flu
82530* Person [10-15] Obesity
83400* Person [30-35] Cancer
83100* Person [25-30] HIV+82530* Person [15-20] Cancer
83400* Person [30-35] Diabetes
82530* Person [25-30] Obesity
83100* Person [25-30] Flu83400* Person [30-35] Flu
-
8/3/2019 k-Anonymity model
13/28
13/07/2011ASRA Group 13
How to decide number of cluster?
-
8/3/2019 k-Anonymity model
14/28
13/07/2011ASRA Group 14
Distance between two numerical values
-
8/3/2019 k-Anonymity model
15/28
13/07/2011ASRA Group 15
Di b C i l l
-
8/3/2019 k-Anonymity model
16/28
13/07/2011ASRA Group 16
Country
America Asia
North South East West
USA Canada Brazil Mexico IndiaEgyptIran Pakistan
C ( v i, v j)=H(( v i , v j ))/H(TD)
Distance between two Categorical values
Fig : Taxonomy Tree of Country
-
8/3/2019 k-Anonymity model
17/28
13/07/2011ASRA Group 17
Function greedy_k_member_clustering (S, k)If ( |S| k)Return S;End if;Result =; r = a randomly picked from S;While ( |S| k)r= the furthest record from r;S=S-{r};C ={r};While ( |C| < k)r= find_best_record(S,C);
S=S-{r};C=C U {r};End while;Result =Result U {C};End while;While ( |S| 0)r= a randomly picked record from S;S=S-{r};C=find_best_cluster(Result, r);C=C U {r};End while;
-
8/3/2019 k-Anonymity model
18/28
13/07/2011ASRA Group 18
Function find_best_record (S, c)Input: a set of records S and a cluster cOutput: a record r S such that IL(c U {r}) is minimaln= |S|; min=; best = null;for(i=1..n)r= i-th record in S;diff= IL(c U {r}) IL(c);If(diff
-
8/3/2019 k-Anonymity model
19/28
13/07/2011ASRA Group 19
Function find_best_cluster (C, r)Input: a set of clusters C and a record r.Output: a cluster c C such that IL(c {r} is minimaln=|C|; min=; best=null;for( i=1..n)c=i-th cluster in C;diff=IL(CU{r}) IL(C);if(diff
-
8/3/2019 k-Anonymity model
20/28
13/07/2011ASRA Group 20
-
8/3/2019 k-Anonymity model
21/28
13/07/2011ASRA Group 21
The time complexity of this algorithmis
O ((n2 log (n))/c), where c is the average
number of records in each cluster.
The time complexity of this algorithm isbetter than greedy k-member algorithm
-
8/3/2019 k-Anonymity model
22/28
13/07/2011ASRA Group 22
It is difficult to decide a propervalue for the user-defined threshold
This algorithm might delete manyrecords, which in turn cause a
significant information loss.
This algorithm is less sensitive to
outliers
-
8/3/2019 k-Anonymity model
23/28
The main goal of the experimentswas to investigate theimplementation of the k-anonymity
model using clustering algorithm.We mainly focus on the data quality,k-anonymization and scalability
which are main consideration of k-anonymity model
13/07/2011ASRA Group 23
-
8/3/2019 k-Anonymity model
24/28
13/07/2011ASRA Group 24
-
8/3/2019 k-Anonymity model
25/28
Finally, keeping in mind data qualityis the big problem in k-anonymization. We also focus ondata quality rather than thecomputation efficiency that shouldbe the main consideration in k-anonymity model, so we areencouraged by our result which
demonstrates that our algorithm isflexible and is able to produce arange of desired anonymization.
13/07/2011ASRA Group 25
-
8/3/2019 k-Anonymity model
26/28
Encouraged by experimentalresult, we are currently workingon more efficient heuristics to
improve the performance of ourapproach.
We are also working to utilize this
clustering algorithm to detectfraud.
13/07/2011ASRA Group 26
-
8/3/2019 k-Anonymity model
27/28
1. Sweeney, L.: k-Anonymity: A Modelfor Protecting Privacy. International
Journal of Uncertainty, Fuzziness andKnowlege-Based Systems 10, 557570(2002)
2. Efficient k-Anonymization using
clustering techniques, Ji-Wyun, R.Kotagiriet al. (Eds.):DASFAA 2007,LNCS 4443,pp. 188-2007.
13/07/2011ASRA Group 27
-
8/3/2019 k-Anonymity model
28/28
13/07/2011ASRA G 28