beyond k-anonymity

46
Beyond k-Anonymity Beyond k-Anonymity Arik Friedman Arik Friedman November 2008 November 2008 Seminar in Databases (236826) Seminar in Databases (236826)

Upload: kaia

Post on 12-Jan-2016

70 views

Category:

Documents


1 download

DESCRIPTION

Beyond k-Anonymity. Arik Friedman November 2008 Seminar in Databases (236826). Outline. Recap – privacy and k -anonymity l -diversity (beyond k-anonymity) t-closeness (beyond k-anonymity and l-diversity) Privacy?. Name Address Date registered Party affiliation Date last voted. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Beyond k-Anonymity

Beyond k-AnonymityBeyond k-Anonymity

Arik FriedmanArik FriedmanNovember 2008November 2008

Seminar in Databases (236826) Seminar in Databases (236826)

Page 2: Beyond k-Anonymity

22

OutlineOutline

Recap – privacy and Recap – privacy and kk-anonymity-anonymity -diversity -diversity (beyond k-anonymity)(beyond k-anonymity)

t-closeness t-closeness (beyond k-anonymity and l-diversity)(beyond k-anonymity and l-diversity)

Privacy?Privacy?

Page 3: Beyond k-Anonymity

Recap - Recap - kk-Anonymity -Anonymity

Using medical data without disclosing patients’ identity:Using medical data without disclosing patients’ identity:

The problem: the ability of an attacker to cross the released data with external data.

ZipBirthdateGender

EthnicityVisit dateDiagnosisProcedureMedication

Total charge

NameAddress

Date registered

Party affiliationDate last votedMedical data Voter List

Quasi-identifier

Page 4: Beyond k-Anonymity

44

K-Anonymity K-Anonymity –– Formal Definition Formal Definition

RT - Released TableRT - Released Table (A1,A2,(A1,A2,……,An) - Attributes,An) - Attributes QIQIRTRT - Quasi Identifier - Quasi Identifier

RT[QIRT[QIRTRT] – Projection of RT on QI] – Projection of RT on QIRTRT

Page 5: Beyond k-Anonymity

Example – original dataExample – original dataNon-Sensitive DataSensitive Data

#ZIPAgeNationalityCondition

11305328RussianHeart Disease

21306829AmericanHeart Disease

31306821JapaneseViral Infection

41305323AmericanViral Infection

51485350IndianCancer

61485355RussianHeart Disease

71485047AmericanViral Infection

81485049AmericanViral Infection

91305331AmericanCancer

101305337IndianCancer

111306836JapaneseCancer

121306835AmericanCancer

Page 6: Beyond k-Anonymity

Example - 4-anonymized TableExample - 4-anonymized TableNon-Sensitive DataSensitive Data

#ZIPAgeNationalityCondition

11305328*Heart Disease

21306829*Heart Disease

31306821*Viral Infection

41305323*Viral Infection

51485350*Cancer

61485355*Heart Disease

71485047*Viral Infection

81485049*Viral Infection

91305331*Cancer

101305337*Cancer

111306836*Cancer

121306835*Cancer

Page 7: Beyond k-Anonymity

Example - 4-anonymized TableExample - 4-anonymized TableNon-Sensitive DataSensitive Data

#ZIPAgeNationalityCondition

113053<30*Heart Disease

213068<30*Heart Disease

313068<30*Viral Infection

413053<30*Viral Infection

51485340*Cancer

61485340*Heart Disease

71485040*Viral Infection

81485040*Viral Infection

9130533**Cancer

10130533**Cancer

11130683**Cancer

12130683**Cancer

Page 8: Beyond k-Anonymity

Example - 4-anonymized TableExample - 4-anonymized TableNon-Sensitive DataSensitive Data

#ZIPAgeNationalityCondition

1130**<30*Heart Disease

2130**<30*Heart Disease

3130**<30*Viral Infection

4130**<30*Viral Infection

51485*40*Cancer

61485*40*Heart Disease

71485*40*Viral Infection

81485*40*Viral Infection

9130**3**Cancer

10130**3**Cancer

11130**3**Cancer

12130**3**Cancer

Page 9: Beyond k-Anonymity

Example - 4-anonymized TableExample - 4-anonymized TableNon-Sensitive DataSensitive Data

#ZIPAgeNationalityCondition

1130**<30*Heart Disease

2130**<30*Heart Disease

3130**<30*Viral Infection

4130**<30*Viral Infection

51485*40*Cancer

61485*40*Heart Disease

71485*40*Viral Infection

81485*40*Viral Infection

9130**3**Cancer

10130**3**Cancer

11130**3**Cancer

12130**3**Cancer

We have 4-anonymity!!!We have privacy!!!!

Page 10: Beyond k-Anonymity

Example - 4-anonymized TableExample - 4-anonymized TableNon-Sensitive DataSensitive Data

#ZIPAgeNat.Condition

1130**<30*Heart Disease

2130**<30*Heart Disease

3130**<30*Viral Infection

4130**<30*Viral Infection

51485*40*Cancer

61485*40*Heart Disease

71485*40*Viral Infection

81485*40*Viral Infection

9130**3**Cancer

10130**3**Cancer

11130**3**Cancer

12130**3**Cancer

Suppose attacker knows the non-sensitive attributes of

And the fact that Japanese have very low incidence of heart disease

NameZipAgeNational

Umeko1306821Japanese

Bob1305331American

Page 11: Beyond k-Anonymity

Example - 4-anonymized TableExample - 4-anonymized TableNon-Sensitive DataSensitive Data

#ZIPAgeNat.Condition

1130**<30*Heart Disease

2130**<30*Heart Disease

3130**<30*Viral Infection

4130**<30*Viral Infection

51485*40*Cancer

61485*40*Heart Disease

71485*40*Viral Infection

81485*40*Viral Infection

9130**3**Cancer

10130**3**Cancer

11130**3**Cancer

12130**3**Cancer

Suppose attacker knows the non-sensitive attributes of

And the fact that Japanese have very low incidence of heart disease

NameZipAgeNational

Umeko1306821Japanese

Bob1305331American

Bob has cancer!

Umeko has viral infection!

Page 12: Beyond k-Anonymity

kk-Anonymity Drawbacks-Anonymity Drawbacks

Basic Reasons for leak:Basic Reasons for leak: Sensitive attributes lack Sensitive attributes lack diversitydiversity in values in values

• Homogeneity AttackHomogeneity Attack Attacker has additional Attacker has additional background knowledgebackground knowledge

• Background knowledge AttackBackground knowledge Attack

Hence a new solution has been proposed Hence a new solution has been proposed in-in-addition addition to k-anonymity – to k-anonymity – -diversity-diversity

Page 13: Beyond k-Anonymity

Adversary’s background knowledgeAdversary’s background knowledge Has access to published table Has access to published table T* T* and knows that it and knows that it

is a generalization of some base table is a generalization of some base table TT Instance-level background knowledge:Instance-level background knowledge:

Some individuals are present in the table. Some individuals are present in the table. Knowledge about sensitive attributes of specific Knowledge about sensitive attributes of specific

individuals. individuals.

Demographic background knowledgeDemographic background knowledge Partial knowledge about the distribution of sensitive and Partial knowledge about the distribution of sensitive and

non-sensitive attributes in the population.non-sensitive attributes in the population.

Diversity in the sensitive attribute values Diversity in the sensitive attribute values should mitigate both!should mitigate both!

Page 14: Beyond k-Anonymity

Some notation…Some notation… T = {tT = {t11, t, t22,…, t,…, tnn} : } :

A table with attributes AA table with attributes A11, A, A22,…, A,…, Amm

Subset of some population Subset of some population t[C] = (t[Ct[C] = (t[C11, C, C22, …, C, …, Cpp]) :]) :

Projection of t onto a set of attributes CProjection of t onto a set of attributes CA A

SSA – sensitive attributesA – sensitive attributes QIQIA – quasi-identifier attributesA – quasi-identifier attributes T*: anonymized tableT*: anonymized table qq*-block – the set of records that were generalized *-block – the set of records that were generalized

to the same value q* in T*to the same value q* in T*

Page 15: Beyond k-Anonymity

Bayes Optimal PrivacyBayes Optimal Privacy

Ideal notion of privacy: models background Ideal notion of privacy: models background knowledge as probability distribution over knowledge as probability distribution over attributesattributes

Uses Bayesian Inference techniquesUses Bayesian Inference techniques Simplifying assumptions:Simplifying assumptions:

A single, multi-dimensional quasi-identifier attribute QA single, multi-dimensional quasi-identifier attribute Q A single sensitive attribute SA single sensitive attribute S T is a simple random sample from T is a simple random sample from Adversary Alice knows complete joint distribution f of Q Adversary Alice knows complete joint distribution f of Q

and S (worst case assumption)and S (worst case assumption)

Page 16: Beyond k-Anonymity

Bayes Optimal PrivacyBayes Optimal Privacy

Assume Bob appears in generalized table T*.Assume Bob appears in generalized table T*. Alice’s Alice’s prior beliefprior belief of Bob’s sensitive attribute:of Bob’s sensitive attribute:

(q,s)(q,s)=P=Pff ( t[S] = s | t[Q] = q) ( t[S] = s | t[Q] = q)

After seeing After seeing T*,T*, Alice’s belief changes to its Alice’s belief changes to its posteriorposterior value value (or (or observed beliefobserved belief):):

(q,s,T*)(q,s,T*)=P=Pff ( t[S] = s | t[Q] = q ( t[S] = s | t[Q] = q t*t*T*, t* generalizes t)T*, t* generalizes t)

We wouldn’t want Alice to learn “much”: We wouldn’t want Alice to learn “much”: (q,s)(q,s)(q,s,T*)(q,s,T*)

Page 17: Beyond k-Anonymity

Bayes Optimal Privacy - ExampleBayes Optimal Privacy - Example Bob, Alice’s neighbor, is a 62 years old state employee.Bob, Alice’s neighbor, is a 62 years old state employee. Alice’s Alice’s prior beliefprior belief: 10% of men over 60 have cancer:: 10% of men over 60 have cancer:

(age(age6060 ZIPcode=02138,cancer) ZIPcode=02138,cancer) = = (age(age60,cancer)60,cancer) = 0.1 = 0.1

In In kk-anonymized GIC data T*, the following lines could -anonymized GIC data T*, the following lines could relate to Bob:relate to Bob:

Alice’s belief changes to its Alice’s belief changes to its posterior valueposterior value::

(age(age60 60 ZIPcode=02138,cancer,T*) ZIPcode=02138,cancer,T*) = 0.5 = 0.5

AgeZipcodeDiagnosis

6002138Cancer

6002138Cancer

6002138Healthy

6002138Pneumonia

Page 18: Beyond k-Anonymity

Bayes Optimal PrivacyBayes Optimal Privacy

Theorem 3.1:Theorem 3.1:

where n(q*,s’) is the number of tuples in T* where n(q*,s’) is the number of tuples in T* with t*[Q] = q* and t*[S] = s’ with t*[Q] = q* and t*[S] = s’

( )

( )( )( )

( )( )( )

*,

, , *

*, ''

|

| *

' |

' | *

q s

q s T

q ss S

f s qn

f s q

f s qn

f s q

b

Î

=

å

Page 19: Beyond k-Anonymity

Privacy principlesPrivacy principles

Positive disclosure:Positive disclosure: the adversary can the adversary can correctly identify the value of a sensitive correctly identify the value of a sensitive attribute: attribute: q,s such that q,s such that (q,s,T*)(q,s,T*)>1->1- for a given for a given

Negative disclosure: Negative disclosure: the adversary can the adversary can correctly eliminate the value of a sensitive correctly eliminate the value of a sensitive attribute: attribute: (q,s,T*)(q,s,T*)<< for a given for a given and and ttT such that T such that

t[Q]=q but t[S]t[Q]=q but t[S]ss

Page 20: Beyond k-Anonymity

Privacy principlesPrivacy principles

Note not all positive and negative disclosures Note not all positive and negative disclosures are badare bad If Alice already knew Bob has Cancer, there is If Alice already knew Bob has Cancer, there is

nothing much one can do!nothing much one can do! Uninformative principle: there should not be Uninformative principle: there should not be

a large difference between the prior and a large difference between the prior and posterior beliefsposterior beliefs

Page 21: Beyond k-Anonymity

Bayes Optimal PrivacyBayes Optimal Privacy

Limitations in practiceLimitations in practice Insufficient knowledge: data publisher unlikely to Insufficient knowledge: data publisher unlikely to

know know ff Publisher does not know how much the adversary Publisher does not know how much the adversary

actually knowsactually knows• He may have instance level knowledgeHe may have instance level knowledge

• No way to model non-probabilistic knowledgeNo way to model non-probabilistic knowledge Multiple adversaries having different levels of Multiple adversaries having different levels of

knowledgeknowledge Hence a Hence a practical practical definition is neededdefinition is needed

Page 22: Beyond k-Anonymity

-diversity principle-diversity principle

Revisit:Revisit:

Positive disclosure can occur when:Positive disclosure can occur when:

( )

( )( )( )

( )( )( )

*,

, , *

*, ''

|

| *

' |

' | *

q s

q s t

q ss S

f s qn

f s q

f s qn

f s q

b

Î

=

å

Page 23: Beyond k-Anonymity

-diversity principle-diversity principle

Could occur due to combination of:Could occur due to combination of: Lack of diversityLack of diversity

Strong background KnowledgeStrong background Knowledge

Mitigate by requiring “well-

represented” sensitive values

At least -1 damaging pieces of background

knowledge required to succeed

Page 24: Beyond k-Anonymity

-diversity principle-diversity principle

A A qq*-block is *-block is -diverse if it contains at -diverse if it contains at least least well-represented well-represented values for the values for the sensitive attribute S. sensitive attribute S.

A table is A table is -diverse if every -diverse if every qq*-block is *-block is -diverse.-diverse.

Example – Example – distinct distinct -diversity-diversity: there are at : there are at least l distinct values for the sensitive attribute least l distinct values for the sensitive attribute in each in each qq*-block.*-block.

Page 25: Beyond k-Anonymity

Non-Sensitive DataSensitive Data

#ZIPAgeNationalityCondition

11305*<= 40*Heart Disease

21305*<= 40*Viral Infection

31305*<= 40*Cancer

41305*<= 40*Cancer

51485*>= 40*Cancer

61485*>= 40*Heart Disease

71485*>= 40*Viral Infection

81485*>= 40*Viral Infection

91306*<= 40*Heart Disease

101306*<= 40*Viral Infection

111306*<= 40*Cancer

121306*<= 40*Cancer

Example – 3-distinct diverse Table

We have 3-distinct diversity!!!

We have privacy!!!!

Page 26: Beyond k-Anonymity

Example - 3-distinct diverse tableExample - 3-distinct diverse tableNon-Sensitive DataSensitive Data

#ZIPAgeNat.Condition

1130**<30*Heart Disease

2130**<30*Heart Disease

3130**<30*Viral Infection

4130**<30*Viral Infection

5130**<30*Viral Infection

6130**<30*Viral Infection

7130**<30*Viral Infection

8130**<30*Viral Infection

9130**<30*Viral Infection

10130**<30*Viral Infection

11130**<30*Viral Infection

12130**<30*Cancer

Suppose attacker knows the non-sensitive attributes of

And the fact that Japanese have very low incidence of heart disease

NameZipAgeNational

Umeko1306821Japanese

Still very likely that Umeko has viral infection!

Page 27: Beyond k-Anonymity

A table is Entropy A table is Entropy -Diverse if for every q*--Diverse if for every q*-block:block:

wherewhere

Entropy Entropy -diversity-diversity

( ) ( )( ) ( )*, *,log logq s q ss S

p pÎ

- ³å

( )( )

( )

*,

*,*,

'

q s

q sq s

s S

np

=å p(S1)p(S2)Entropy

1001

0.90.10.141.38

0.80.20.221.65

0.70.30.271.84

0.60.40.291.96

0.50.50.32

Not feasible when one value is very common

Example with 2 sensitive attribute values

Page 28: Beyond k-Anonymity

Recursive (Recursive (cc,,)-diversity)-diversity None of the sensitive values should occur None of the sensitive values should occur too too

frequently.frequently. Let Let rrii be the i be the ithth most frequent sensitive value most frequent sensitive value

Given const Given const c, recursive (c, c, recursive (c, ))-diversity is satisfied if -diversity is satisfied if

rr11 < < c ( rc ( r + r + r+1+1 + … + r + … + rmm ) )

For example, with 3 attributes (m=3):For example, with 3 attributes (m=3): (2,2)-diversity: r(2,2)-diversity: r11<2(r<2(r22+r+r33))

(2,3)-diversity: r(2,3)-diversity: r11<2r<2r33 Equivalently: even if we eliminate a sensitive value, we still have (2,2)-diversityEquivalently: even if we eliminate a sensitive value, we still have (2,2)-diversity

Page 29: Beyond k-Anonymity

An algorithm for An algorithm for -diversity?-diversity?

Monotonicity property:Monotonicity property:If If T*T* preserves privacy, preserves privacy,

then so does every generalization of itthen so does every generalization of it

Satisfied by Satisfied by kk-anonymity-anonymity Most k-anonymization algorithms work for any privacy Most k-anonymization algorithms work for any privacy

measure that satisfies monotonicity - measure that satisfies monotonicity - We can re-use We can re-use previous algorithms directlyprevious algorithms directly

Bayes optimal privacy is not monotonicBayes optimal privacy is not monotonic -diversity variants are monotonic!-diversity variants are monotonic!

Page 30: Beyond k-Anonymity

Mondrian(partition) if (no allowable multidimensional cut for

partition)return : partition summary

else dim choose dimension() fs frequency set(partition, dim) splitVal find median(fs) lhs {t partition : t.dim splitVal} rhs {t partition : t.dim > splitVal} return Mondrian(rhs) Mondrian(lhs)

We

igh

t

35 4540 5550 6560 7050

55

60

65

70

75

80

85

Age

Example: Mondrian-entropy diverse, = 1.89(for two sensitive attributes, equivalent to limiting prevalence to up to 2/3. Also equivalent to recursive (2,2)-diversity)

Page 31: Beyond k-Anonymity

ExperimentsExperiments Used Incognito (a popular generalization algorithm)Used Incognito (a popular generalization algorithm) Adult dataset (Census data) from the UCI machine Adult dataset (Census data) from the UCI machine

learning repository learning repository ((http://archive.ics.uci.edu/ml/datasets/Adult))

Adult Database

Description

Experiment results refer to this sensitive attribute

Page 32: Beyond k-Anonymity

Experiments - UtilityExperiments - Utility

Intuitively: “usefulness” of the Intuitively: “usefulness” of the -diverse and -diverse and kk-anonymized -anonymized tables. Used tables. Used k, k, = 2, 4, 6, 8= 2, 4, 6, 8

Number of generalization steps that were performed vs. k,

Average size of q*-blocks generated (similar to CAVG) vs. k,

Page 33: Beyond k-Anonymity

Non-Sensitive DataSensitive Data

#ZIPAgeNationalityCondition

11305*<= 40*Heart Disease

21305*<= 40*Viral Infection

31305*<= 40*Cancer

41305*<= 40*Cancer

51485*>= 40*Cancer

61485*>= 40*Heart Disease

71485*>= 40*Viral Infection

81485*>= 40*Viral Infection

91306*<= 40*Heart Disease

101306*<= 40*Viral Infection

111306*<= 40*Cancer

121306*<= 40*Cancer

Example – 3-diverse Table

We have 3-diversity!!!We have privacy!!!!

Page 34: Beyond k-Anonymity

Similarity attackSimilarity attack

BobZipAge

4767827

Zipcode

AgeSalaryDisease

476**2*20KGastric Ulcer

476**2*30KGastritis

476**2*40KStomach Cancer

4790*≥ 4050KGastritis

4790*≥ 40100KFlu

4790*≥ 4070KBronchitis

476**3*60KBronchitis

476**3*80KPneumonia

476**3*90KStomach Cancer

A 3-diverse patient table

Conclusion1. Bob’s salary is in [20k,40k], which is

relative low.

2. Bob has some stomach-related disease.

l-diversity does not consider semantic meanings of sensitive values

l-diversity is insufficient to prevent attribute disclosure.

Page 35: Beyond k-Anonymity

Skewness attackSkewness attackNon-Sensitive

DataSensitive

Data

#AgeCondition

1<30Cancer

2<30Cancer

3<30Healthy

4<30Healthy

53*Cancer

63*Healthy

73*Healthy

83*Healthy

93*Healthy

1030Healthy

1130Cancer

1230Cancer

1330Cancer

1430Cancer

Two sensitive values in :

Cancer (1%) and Healthy (99%)

(entropy: 1.0576)

entropy: 2

entropy: 1.65

entropy: 1.65

Equivalent in terms of -diversity, but very different semantically

Attacker learned a lot!

Page 36: Beyond k-Anonymity

tt-Closeness: the main idea-Closeness: the main idea

RationaleRationaleAgeZipcode……GenderDisease

**……*Flu

**……*Heart Disease

**……*Cancer

.

.

.

.

.

.

………………

.

.

.

.

.

.

**……*Gastritis

ExternalKnowledge

Overall distribution Q of sensitive values

BeliefKnowledge

B0

B1

A completely generalized table

Page 37: Beyond k-Anonymity

tt-Closeness: the main idea-Closeness: the main idea

RationaleRationale

ExternalKnowledge

AgeZipcode……

GenderDisease

2*479**……

MaleFlu

2*479**……

MaleHeart Disease

2*479**……

MaleCancer

.

.

.

.

.

.

………………

.

.

.

.

.

.

≥ 504766*……

*Gastritis

Overall distribution Q of sensitive values

Distribution Pi of sensitive values in each equivalence class

BeliefKnowledge

B0

B1

B2

A released table

Page 38: Beyond k-Anonymity

tt-Closeness: the main idea-Closeness: the main idea

RationaleRationale

ExternalKnowledge

Overall distribution Q of sensitive values

Distribution Pi of sensitive values in each equivalence class

BeliefKnowledge

B0

B1

B2

Observations Q should be treated as public Knowledge gain in two parts:

Whole population (from B0 to B1) Specific individuals (from B1 to B2)

We bound knowledge gain between B1 and B2 instead

Principle The distance between Q and Pi

should be bounded by a threshold t.

Page 39: Beyond k-Anonymity

tt-closeness-closenessAn equivalence class is said to have An equivalence class is said to have tt-closeness if -closeness if the distance between the distribution of a sensitive the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in this class and the distribution of the attribute in the whole table is no more than a attribute in the whole table is no more than a threshold threshold tt. .

A table is said to have A table is said to have tt-closeness if all -closeness if all equivalence classes have t-closeness.equivalence classes have t-closeness.

A distance measure called Earth Movers Distance A distance measure called Earth Movers Distance is used. It maintains monotonicity!is used. It maintains monotonicity!

Page 40: Beyond k-Anonymity

Non-Sensitive DataSensitive Data

#ZIPAgeSalaryCondition

14767*<= 403KGastric ulcer

24767*<= 405KStomach cancer

34767*<= 409KPneumonia

44790*>= 406KGastritis

54790*>= 4011KFlu

64790*>= 408KBronchitis

74760*<= 404KGastritis

84760*<= 407KBronchitis

94760*<= 4010KStomach cancer

Example – t-closeness

We have 0.167-closeness w.r.t. Salary and 0.278-closeness

w.r.t. Disease!!!We have privacy!!!!

Page 41: Beyond k-Anonymity

Netflix privacy breachNetflix privacy breach(Robust De-anonymization of Large Sparse Datasets, (Robust De-anonymization of Large Sparse Datasets,

Narayanan and Shmatikov, 2008)Narayanan and Shmatikov, 2008)

Released for the Netflix Prize contestReleased for the Netflix Prize contest 17,770 movie titles17,770 movie titles 480,189 users with random customer IDs480,189 users with random customer IDs Ratings: 1-5Ratings: 1-5 For each movie we have the ratings:For each movie we have the ratings:

• (MovieID, CustomerID, Rating, Date)(MovieID, CustomerID, Rating, Date)

Re-arrange by customerID:Re-arrange by customerID:

4141

MovieCustomerIDRankDate

The Godfather17236420.5

Quantum of Solace17236220.11

Hamlet17236514.10

The Scorpion King17236112.8

The profit17236511.8

Page 42: Beyond k-Anonymity

Netflix privacy breachNetflix privacy breach(Robust De-anonymization of Large Sparse Datasets, (Robust De-anonymization of Large Sparse Datasets,

Narayanan and Shmatikov, 2008)Narayanan and Shmatikov, 2008)

Can be linked, e.g., with IMDB data, to re-Can be linked, e.g., with IMDB data, to re-identify individuals!identify individuals!

4242

MovieCustomerIDRankDate

The Godfather17236420.5

Quantum of Solace17236220.11

Hamlet17236514.10

The Scorpion King17236112.8

The profit17236511.8

Netflix data

IMDB data

)This example is made up. Possibly, James Hitchcock has nothing to do with Netflix(

Page 43: Beyond k-Anonymity

EpilogueEpilogue

4343

“You have zero privacy anyway.Get over it”.

Scott McNeally (SUN CEO, January 1999)

Page 44: Beyond k-Anonymity

HIPAA excerptHIPAA excerptHealth Insurance Portability and Accountability Act of 1996Health Insurance Portability and Accountability Act of 1996

Page 45: Beyond k-Anonymity

4545

Thank you!

Page 46: Beyond k-Anonymity

4646

BibliographyBibliography

““Mondrian Multidimensional k-Anonymity”,K. LeFevre, D.J. Mondrian Multidimensional k-Anonymity”,K. LeFevre, D.J. DeWitt, R. Ramakrishnan,2006DeWitt, R. Ramakrishnan,2006

-diversity: Privacy beyond -diversity: Privacy beyond kk-anonymity, A. Machanavajjhala, -anonymity, A. Machanavajjhala, Johannes Gehrke, Daniel Kifer, 2006Johannes Gehrke, Daniel Kifer, 2006

T-closeness: Privacy beyond T-closeness: Privacy beyond kk-anonymity and -anonymity and -diversity, -diversity, Ninghui Li, Tiancheng Li, Suresh Venkatasubramanian, 2006Ninghui Li, Tiancheng Li, Suresh Venkatasubramanian, 2006

Presentations:Presentations: ““Privacy In Databases”, B. Aditya PrakashPrivacy In Databases”, B. Aditya Prakash ““K-Anonymity and Other Cluster-Based Methods”, Ge. RuanK-Anonymity and Other Cluster-Based Methods”, Ge. Ruan