Privacy-preserving Anonymization of Set Value Data
Manolis Terrovitis, Nikos MamoulisUniversity of Hong Kong
Panos KalnisNational University of Singaporewww.comp.nus.edu.sg/~kalnis
2
Motivation
Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items
0% M
ilk
Preg
nanc
y
test
Beer
Helen
3
Motivation (cont.)
Helen: Beer, 0% Milk, Pregnancy testJohn: Cola, CheeseTom: 2% Milk, Coffee….Mary: Wine, Beer, Full-fat Milk
Database
t1: Beer, 0%Milk, Pregnancy testt2: Cola, Cheeset3: 2% Milk, Coffee….tn: Wine, Beer, Full-fat Milk
Published
AttackerFind all transactions that contain Beer & 0% Milk
t1: Beer, Milk, Pregnancy testt2: Cola, Cheeset3: Milk, Coffee….tn: Wine, Beer, Milk
4
km-anonymity
Di
tttD
t
ooo
,...,
,...,,
21
21
Set of items
Transaction
Database
tqsDttres |
kresres 0
mqs Query terms
km-anonymity:
5
Related Work: K-Anonymity [Swe02]
Age ZipCode Disease
42 25000 Flu
46 35000 AIDS
50 20000 Cancer
54 40000 Gastritis
48 50000 Dyspepsia
56 55000 Bronchitis
[Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.
(a) Microdata
Quasi-identifier
Age ZipCode Disease
42-46 25000-35000 Flu
42-46 25000-35000 AIDS
50-54 20000-40000 Cancer
50-54 20000-40000 Gastritis
48-56 50000-55000 Dyspepsia
48-56 50000-55000 Bronchitis
(a) 2-anonymous microdata
NOT suitable for high-dimensionality
6
Related Work: L-diversity in Transactions
[GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008
Requires knowledge of (non)-sensitive attributes
7
Our Approach: Employs Generalization
Aaa 21,
Gen
era
lizati
on
H
iera
rch
y
otherwise ,
node leaf ,0)(
pupNCP
Information loss
k=2m=2
15
Apriori Anonymization for Large Datasets
500
sec
10se
c
100
sec
|D| |I|
515K 1657
59K 497
77K 3340
k=5 m=3
16
Points to Remember
Anonymization of Transactional Data Attacker knows m items Any m items can be the quasi-identifier
Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information
loss On-going work
Local recoding (sort by Gray order and partition)
Transactional data in streaming environments