m-invariance and dynamic datasets based on: xiaokui xiao, yufei tao m-invariance: towards privacy...
TRANSCRIPT
![Page 1: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/1.jpg)
m-Invariance and Dynamic Datasets
based on:
Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic
Datasets
Slawomir Goryczka
![Page 2: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/2.jpg)
Panta rhei (Heraclitus)"everything is in a state of flux"
To provide most recent anonymized data publisher needs to re-publish them
Most of the current approaches do not consider this!
Exception: Support only insertions of data J.-W. Byun, Y. Sohn, E. Bertino, and N. Li Secure
anonymization for incremental datasets. (2006) Where is the problem?
![Page 3: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/3.jpg)
Maybe it's simple?
We just need to ensure that: Dataset is not published too often (movie effect) We use different algorithm for each dataset
snapshot (“white” noise instead of the movie effect, but may be used to identify part of the data!)
Play with data to keep similar statistics of attribute values – what with long time trends, i.e. flu pandemic, which change global and local statistics of the data
![Page 4: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/4.jpg)
Deletion of tuples
Deletion of data may introduce critical absence:
![Page 5: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/5.jpg)
Deletion of tuples
Deletion of data may introduce critical absence:
![Page 6: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/6.jpg)
Deletion of tuples
Deletion of data may introduce critical absence:
![Page 7: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/7.jpg)
Deletion of tuples
Deletion of data may introduce critical absence:
![Page 8: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/8.jpg)
Deletion of tuples
Deletion of data may introduce critical absence:
Bob has dyspepsia
![Page 9: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/9.jpg)
Deletion of tuples
Deletion of data may introduce critical absence:
Bob has dyspepsia
Solution(?)
Ignore deletions
![Page 10: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/10.jpg)
Counterfeit generalization
Add some counterfeit tuples to avoid critical absence
Publish number and location of these tuples (utility)
![Page 11: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/11.jpg)
Counterfeit generalization
Add some counterfeit tuples to avoid critical absence
Publish number and location of these tuples (utility)
![Page 12: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/12.jpg)
Counterfeit generalization
Add some counterfeit tuples to avoid critical absence
Publish number and location of these tuples (utility)
![Page 13: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/13.jpg)
Counterfeit generalization(continued)
Crucial to preserve privacy is to ensure certain invariance in all quasi-identifier groups that a tuple (here: Bob's tuple) is generalized to in different snapshots
Existing generalization schemas are special cases of counterfeited generalization, where there is no counterfeits
Goal: minimize number of counterfeit tuples, but ensure privacy among all snapshots. How?
![Page 14: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/14.jpg)
m-Invariance
m-unique each QI group in anonymized table T*(j) contains ≥m tuples with different sensitive data among them m-invariant T*(j) is m-unique for all 1≤j≤n For each tuple t, for each data snapshot where this
tuple appears, its QI generalized group have the same set of distinct sensitive values
(For each QI generalized group its set of distinct sensitive values is constant – no problems with critical absence, but each tuple have limited number of QI generalized groups where it can belongs to)
![Page 15: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/15.jpg)
Privacy disclosure risk
Privacy disclosure risk for tuple t:
risk(t) = nis(t)/nrs nis(t) – number of
reasonable surjective functions that correctly reconstruct t
nrs – number of all reasonable surjections
![Page 16: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/16.jpg)
m-Invariance (properties)
If {T*(1), ..., T*(n)} is m-invariant, then risk(i) ≤ 1/m, 1 ≤ i ≤ n
If {T*(1), ..., T*(n-1)} is m-invariant, then {T*(1), ..., T*(n)} is also m-invariant if and only if: T*(n) is m-unique For any tuple its generalized QI
groups in snapshots T*(n-1) and T*(n) have the same signature (set of distinct sensitive values).
t ∈T n− 1∩T n
![Page 17: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/17.jpg)
m-Invariant algorithm
n-th publication is allowed, only if T(n)-T(n-1) is m-eligible, that is, at most 1/m of the tuples in T(n)-T(n-1) have an identical sensitive value
Algorithm (4 phases):
1.Division
2.Balancing
3.Assignment
4.Split
![Page 18: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/18.jpg)
m-Invariant algorithm(continued)
Division – group tuples common for T*(n-1) and T(n) with the same signature into one bucket
Balancing – balance number of tuples in buckets using counterfeits if necessary (they have no value for QI attributes)
![Page 19: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/19.jpg)
m-Invariant algorithm(continued)
Assignment – add tuples, which were not in T*(n-1), but are in T(n) using similar steps to Dividing and Balancing
Split – split each bucket B into |B|/s QI generalized groups where s (≥m) is the number of values in the signature of B. Each group has s tuples, taking the s sensitive values in the signature, respectively.
![Page 20: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/20.jpg)
m-Invariant algorithm(continued)
Assignment – add tuples, which were not in T*(n-1), but are in T(n) using similar steps to Dividing and Balancing
Split – split each bucket B into |B|/s QI generalized groups where s (≥m) is the number of values in the signature of B. Each group has s tuples, taking the s sensitive values in the signature, respectively.
![Page 21: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/21.jpg)
Datasets (Tooc, Tsal): 400k tuples (600k in total) Attributes: Age, Gender, Education, Birthplace,
Occupation, Salary
Experiments
![Page 22: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/22.jpg)
Pros and cons
Incremental Small data
disturbance High data utility
(measured as a median relative error for queries)
...
Preserving current statistics of attribute values – what if they change?
What about continues attributes (numbers)?
...
![Page 23: M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir](https://reader035.vdocument.in/reader035/viewer/2022062301/56649ca35503460f94963d48/html5/thumbnails/23.jpg)
Q & I*
* Ideas