privacy-preserving anonymization of set value data

16
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis University of Hong Kong (HKU) Panos Kalnis King Abdullah University of Science and Technology (KAUST)

Upload: eshana

Post on 25-Feb-2016

46 views

Category:

Documents


4 download

DESCRIPTION

Privacy-preserving Anonymization of Set Value Data. Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis University of Hong Kong (HKU) Panos Kalnis King Abdullah University of Science and Technology (KAUST). Motivation. Helen. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Privacy-preserving  Anonymization  of Set Value Data

Privacy-preserving Anonymization of Set Value Data

Manolis TerrovitisInstitute for the Management of Information Systems

(IMIS), RC AthenaNikos Mamoulis

University of Hong Kong (HKU)Panos Kalnis

King Abdullah University of Science and Technology (KAUST)

Page 2: Privacy-preserving  Anonymization  of Set Value Data

2

Motivation

Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items

0% Milk

Pregn

ancy

test

Beer

Helen

Page 3: Privacy-preserving  Anonymization  of Set Value Data

3

Motivation (cont.)

Helen: Beer, 0% Milk, Pregnancy testJohn: Cola, CheeseTom: 2% Milk, Coffee….Mary: Wine, Beer, Full-fat Milk

Database

t1: Beer, 0%Milk, Pregnancy testt2: Cola, Cheeset3: 2% Milk, Coffee….tn: Wine, Beer, Full-fat Milk

Published

AttackerFind all transactions that contain Beer & 0% Milk

t1: Beer, Milk, Pregnancy testt2: Cola, Cheeset3: Milk, Coffee….tn: Wine, Beer, Milk

Page 4: Privacy-preserving  Anonymization  of Set Value Data

4

km-anonymity

Di

tttDt

ooo

,...,

,...,,

21

21

Set of items

TransactionDatabase

tqsDttres |

kresres 0

mqs Query terms

km-anonymity:

Page 5: Privacy-preserving  Anonymization  of Set Value Data

5

Related Work: K-Anonymity [Swe02]

Age ZipCode Disease42 25000 Flu46 35000 AIDS50 20000 Cancer54 40000 Gastritis48 50000 Dyspepsia56 55000 Bronchitis

[Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.

(a) Microdata

Quasi-identifier

Age ZipCode Disease42-46 25000-35000 Flu42-46 25000-35000 AIDS50-54 20000-40000 Cancer50-54 20000-40000 Gastritis48-56 50000-55000 Dyspepsia48-56 50000-55000 Bronchitis

(a) 2-anonymous microdata

NOT suitable for high-dimensionality

Page 6: Privacy-preserving  Anonymization  of Set Value Data

6

Related Work: L-diversity in Transactions

[GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008

Requires knowledge of (non)-sensitive attributes

Page 7: Privacy-preserving  Anonymization  of Set Value Data

7

Our Approach: Employs Generalization

Aaa 21,

Gene

raliz

atio

n Hi

erar

chy

otherwise , node leaf ,0

)(pu

pNCP

Information loss

k=2m=2

Page 8: Privacy-preserving  Anonymization  of Set Value Data

8

Lattice of Generalizations

Page 9: Privacy-preserving  Anonymization  of Set Value Data

9

Optimal Algorithm

Q: Q: Q:

Page 10: Privacy-preserving  Anonymization  of Set Value Data

10

Count Tree

1221

1212122 ,,,

,,,,,,,,

baBaAbABbaBA

BAbabat

A1B

1 2a 1 1b 1

1b 1B

1 2a 1 1b 1

1 1 1

All generalized forms of the paths reside in the tree We can find easily which anonymizations are needed

Page 11: Privacy-preserving  Anonymization  of Set Value Data

11

Apriori-based Anonymization

Global Optimal vs Local Optimal Solution for each path

We examine the paths By size (A priori principle) Paths with invalid nodes are skipped

Page 12: Privacy-preserving  Anonymization  of Set Value Data

12

Apriori-based Anonymization1. Initialize gen_map2. For i := 1 to m do

1. For all t D do1. Extend t acccording to gen_map2. Add all i-subsets of extended t to

count-tree3. Check all paths in count tree and update

gen_map

Page 13: Privacy-preserving  Anonymization  of Set Value Data

13

Small Datasets (2-15K, BMS-WebView2)

|I|=40..60, k=100, m=3

Page 14: Privacy-preserving  Anonymization  of Set Value Data

14

Small Datasets (BMS-WebView2)

|D|=10K, k=100, m=1..4

Page 15: Privacy-preserving  Anonymization  of Set Value Data

15

Apriori Anonymization for Large Datasets

500s

ec10

sec

100s

ec |D| |I|515K 165759K 49777K 3340

k=5 m=3

Page 16: Privacy-preserving  Anonymization  of Set Value Data

16

Points to Remember Anonymization of Transactional Data

Attacker knows m items Any m items can be the quasi-identifier

Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information

loss Extensions (VLDBJ 2010)

Local recoding (sort by Gray order and partition)

Global recoding (by partitioning the data domain)