k-anonymity model

Upload: anubhav129

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 k-Anonymity model

    1/28

    Presented by

    Anubhav,Saurav,Ravi,Ashutosh (ASRA Group)

    CSE/2k7

    Guided by

    Prof. Binod Kumar

    13/07/2011ASRA Group 1

  • 8/3/2019 k-Anonymity model

    2/28

    1. Introduction

    2. Motivation

    3. Achieving Anonymity via Clustering

    4. Proposed algorithm5. Experimental result

    6. Conclusion

    7. Future Work

    13/07/2011ASRA Group 2

  • 8/3/2019 k-Anonymity model

    3/28

    Data holders, Statistics Offices are

    facing tremendous demand forPerson specific data for theapplication such as :-

    Data mining

    Cost analysis

    Fraud detection

    13/07/2011

    ASRA Group

    3

  • 8/3/2019 k-Anonymity model

    4/28

    13/07/2011

    ASRA Group

    4

    How can a data holder release aversion of its private data with

    scientific guarantees that theindividuals who are the subjects of the

    data cant be re-identified while the

    data remains practically useful forsurvey work.

  • 8/3/2019 k-Anonymity model

    5/28

    13/07/2011

    ASRA Group

    5

    k-Anonymity Model

  • 8/3/2019 k-Anonymity model

    6/28

    13/07/2011

    ASRA Group

    6

    Uniquelyidentify

    you!

    Sensitive

    Zipcode Age Gender Disease

    75275 22 Male Flu

    75277 23 Male Cold75278 24 Male Diabetes

    75275 33 Male Flu

    75275 38 Female Arthritis

    75275 36 FemaleHeart

    problem

    Quasi-identifiers:approximate foreign keys

  • 8/3/2019 k-Anonymity model

    7/28

    Identifying Sensitive

    Mobile number Name Zipcode Gender age Disease

    9905150112 Amit 75275 Male 22 Flu

    9905121223 John 75277 Male 23 Cold

    9431103097 Rajan 75278 Male 24 Diabetes

    9334292352 Robin 75275 Male 33 Flu

    9431109087 Ramesh75275

    Female 38 Arthritis

    9421345678 Dhoni 75275 Female 36 Arthritis

    13/07/2011

    ASRA Group

    7

    Quasi-identifiers:approximate foreign keys

  • 8/3/2019 k-Anonymity model

    8/2813/07/2011

    ASRA Group

    8

    Sensitive

    Age Gender Zip code Disease

    22 Male 75275 Flu

    23 Male 75277 Cold

    24 Male 75278 Diabetes

    33 Male 75275 Flu

    38 Female 75275 Arthritis

    36 Female 75275 Heart problem

    Quasi-identifiers:approximate foreign keys

  • 8/3/2019 k-Anonymity model

    9/2813/07/2011

    ASRA Group

    9

    Zip Code Gender Age Disease Expense

    75277 Male 22 Flu 100

    75277 Male 23 Cancer 3000

    75277 Male 24 HIV+ 5000

    75275 Male 33 Diabetes 2500

    75275 Female 38 Diabetes 2800

    75275 Female 36 Diabetes 2600

    Quasi-identifiers:approximate foreign keys

  • 8/3/2019 k-Anonymity model

    10/2813/07/2011

    ASRA Group

    10

    Zip Code Gender Age Disease Expense

    7527* Person [21-30] Flu 100

    7527* Person [21-30] Cancer 3000

    7527* Person [21-30] HIV+ 5000

    7527* Person [31-40] Diabetes 2500

    7527* Person [31-40] Diabetes 2800

    7527* Person [31-40] Diabetes 2600

  • 8/3/2019 k-Anonymity model

    11/2813/07/2011ASRA Group 11

    Zip Code Gender Age Disease Expense

    7527* Male [21-25] Flu 100

    7527* Male [21-25] Cancer 3000

    7527* Male [21-25] HIV+ 5000

    75275 Person [31-40] Diabetes 2500

    75275 Person [31-40] Diabetes 2800

    75275 Person [31-40] Diabetes 2600

  • 8/3/2019 k-Anonymity model

    12/2813/07/2011ASRA Group 12

    Zipcode Gender Age Disease83100* Person [25-30] Flu

    82530* Person [10-15] Obesity

    83400* Person [30-35] Cancer

    83100* Person [25-30] HIV+82530* Person [15-20] Cancer

    83400* Person [30-35] Diabetes

    82530* Person [25-30] Obesity

    83100* Person [25-30] Flu83400* Person [30-35] Flu

  • 8/3/2019 k-Anonymity model

    13/28

    13/07/2011ASRA Group 13

    How to decide number of cluster?

  • 8/3/2019 k-Anonymity model

    14/28

    13/07/2011ASRA Group 14

    Distance between two numerical values

  • 8/3/2019 k-Anonymity model

    15/28

    13/07/2011ASRA Group 15

    Di b C i l l

  • 8/3/2019 k-Anonymity model

    16/28

    13/07/2011ASRA Group 16

    Country

    America Asia

    North South East West

    USA Canada Brazil Mexico IndiaEgyptIran Pakistan

    C ( v i, v j)=H(( v i , v j ))/H(TD)

    Distance between two Categorical values

    Fig : Taxonomy Tree of Country

  • 8/3/2019 k-Anonymity model

    17/28

    13/07/2011ASRA Group 17

    Function greedy_k_member_clustering (S, k)If ( |S| k)Return S;End if;Result =; r = a randomly picked from S;While ( |S| k)r= the furthest record from r;S=S-{r};C ={r};While ( |C| < k)r= find_best_record(S,C);

    S=S-{r};C=C U {r};End while;Result =Result U {C};End while;While ( |S| 0)r= a randomly picked record from S;S=S-{r};C=find_best_cluster(Result, r);C=C U {r};End while;

  • 8/3/2019 k-Anonymity model

    18/28

    13/07/2011ASRA Group 18

    Function find_best_record (S, c)Input: a set of records S and a cluster cOutput: a record r S such that IL(c U {r}) is minimaln= |S|; min=; best = null;for(i=1..n)r= i-th record in S;diff= IL(c U {r}) IL(c);If(diff

  • 8/3/2019 k-Anonymity model

    19/28

    13/07/2011ASRA Group 19

    Function find_best_cluster (C, r)Input: a set of clusters C and a record r.Output: a cluster c C such that IL(c {r} is minimaln=|C|; min=; best=null;for( i=1..n)c=i-th cluster in C;diff=IL(CU{r}) IL(C);if(diff

  • 8/3/2019 k-Anonymity model

    20/28

    13/07/2011ASRA Group 20

  • 8/3/2019 k-Anonymity model

    21/28

    13/07/2011ASRA Group 21

    The time complexity of this algorithmis

    O ((n2 log (n))/c), where c is the average

    number of records in each cluster.

    The time complexity of this algorithm isbetter than greedy k-member algorithm

  • 8/3/2019 k-Anonymity model

    22/28

    13/07/2011ASRA Group 22

    It is difficult to decide a propervalue for the user-defined threshold

    This algorithm might delete manyrecords, which in turn cause a

    significant information loss.

    This algorithm is less sensitive to

    outliers

  • 8/3/2019 k-Anonymity model

    23/28

    The main goal of the experimentswas to investigate theimplementation of the k-anonymity

    model using clustering algorithm.We mainly focus on the data quality,k-anonymization and scalability

    which are main consideration of k-anonymity model

    13/07/2011ASRA Group 23

  • 8/3/2019 k-Anonymity model

    24/28

    13/07/2011ASRA Group 24

  • 8/3/2019 k-Anonymity model

    25/28

    Finally, keeping in mind data qualityis the big problem in k-anonymization. We also focus ondata quality rather than thecomputation efficiency that shouldbe the main consideration in k-anonymity model, so we areencouraged by our result which

    demonstrates that our algorithm isflexible and is able to produce arange of desired anonymization.

    13/07/2011ASRA Group 25

  • 8/3/2019 k-Anonymity model

    26/28

    Encouraged by experimentalresult, we are currently workingon more efficient heuristics to

    improve the performance of ourapproach.

    We are also working to utilize this

    clustering algorithm to detectfraud.

    13/07/2011ASRA Group 26

  • 8/3/2019 k-Anonymity model

    27/28

    1. Sweeney, L.: k-Anonymity: A Modelfor Protecting Privacy. International

    Journal of Uncertainty, Fuzziness andKnowlege-Based Systems 10, 557570(2002)

    2. Efficient k-Anonymization using

    clustering techniques, Ji-Wyun, R.Kotagiriet al. (Eds.):DASFAA 2007,LNCS 4443,pp. 188-2007.

    13/07/2011ASRA Group 27

  • 8/3/2019 k-Anonymity model

    28/28

    13/07/2011ASRA G 28