sumathie sundaresan advisor : dr. huiping guo survey of privacy protection for medical data
TRANSCRIPT
![Page 1: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/1.jpg)
Sumathie SundaresanAdvisor : Dr. Huiping Guo
Survey of Privacy Protection for Medical Data
![Page 2: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/2.jpg)
AbstractExpanded scientific knowledge, combined with the
development of the net and widespread use of computers have increased the need for strong privacy protection for medical records. We have all heard stories of harassment that has resulted because of the lack of adequate privacy protection of medical records.
"...medical information is routinely shared with and viewed by third parties who are not involved in patient care .... The American Medical Records Association has identified twelve categories of information seekers outside of the health care industry who have access to health care files, including employers, government agencies, credit bureaus, insurers, educational institutions, and the media."
![Page 3: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/3.jpg)
MethodsGeneralizationk-anonymityl-diversityt-closenessm-invariancePersonalized Privacy PreservationAnatomy
![Page 4: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/4.jpg)
Privacy preserving data publishing
Microdata
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
Alice 22 14000 bronchitisAndy 24 18000 fluDavid 23 25000 gastritisGary 41 20000 fluHelen 36 27000 gastritisJane 37 33000 dyspepsiaKen 40 35000 flu
Linda 43 26000 gastritisPaul 52 33000 dyspepsiaSteve 56 34000 gastritis
![Page 5: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/5.jpg)
Classification of AttributesKey Attribute:
Name, Address, Cell Phonewhich can uniquely identify an individual directlyAlways removed before release.
Quasi-Identifier: 5-digit ZIP code,Birth date, genderA set of attributes that can be potentially linked
with external information to re-identify entities87% of the population in U.S. can be uniquely
identified based on these attributes, according to the Census summary data in 1991.
Suppressed or generalized
![Page 6: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/6.jpg)
Classification of Attributes(Cont’d)
Sensitive Attribute: Medical record, wage,etc.Always released directly. These attributes is
what the researchers need. It depends on the requirement.
![Page 7: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/7.jpg)
Inference attack
Age Zipcode Disease21 12000 dyspepsia22 14000 bronchitis24 18000 flu23 25000 gastritis41 20000 flu36 27000 gastritis37 33000 dyspepsia40 35000 flu43 26000 gastritis52 33000 dyspepsia56 34000 gastritis
Published table
An adversary
Quasi-identifier (QI) attributes
Name Age ZipcodeBob 21 12000
![Page 8: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/8.jpg)
GeneralizationTransform the QI values into less specific
forms
generalize
Age Zipcode Disease21 12000 dyspepsia22 14000 bronchitis24 18000 flu23 25000 gastritis41 20000 flu36 27000 gastritis37 33000 dyspepsia40 35000 flu43 26000 gastritis52 33000 dyspepsia56 34000 gastritis
Age Zipcode Disease[21, 22] [12k, 14k] dyspepsia[21, 22] [12k, 14k] bronchitis[23, 24] [18k, 25k] flu[23, 24] [18k, 25k] gastritis[36, 41] [20k, 27k] flu[36, 41] [20k, 27k] gastritis[37, 43] [26k, 35k] dyspepsia[37, 43] [26k, 35k] flu[37, 43] [26k, 35k] gastritis[52, 56] [33k, 34k] dyspepsia[52, 56] [33k, 34k] gastritis
![Page 9: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/9.jpg)
GeneralizationTransform each QI value into a less specific
formA generalized table
An adversary
Name Age ZipcodeBob 21 12000
Age Zipcode Disease[21, 22] [12k, 14k] dyspepsia[21, 22] [12k, 14k] bronchitis[23, 24] [18k, 25k] flu[23, 24] [18k, 25k] gastritis[36, 41] [20k, 27k] flu[36, 41] [20k, 27k] gastritis[37, 43] [26k, 35k] dyspepsia[37, 43] [26k, 35k] flu[37, 43] [26k, 35k] gastritis[52, 56] [33k, 34k] dyspepsia[52, 56] [33k, 34k] gastritis
![Page 10: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/10.jpg)
K-Anonymity Sweeny came up with a formal protection
model named k-anonymityWhat is K-Anonymity?
If the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release.
Example.If you try to identify a man from a release, but the only information you have is his birth date and gender. There are k people meet the requirement. This is k-Anonymity.
![Page 11: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/11.jpg)
![Page 12: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/12.jpg)
Attacks Against K-Anonymity
Unsorted Matching AttackThis attack is based on the order in which
tuples appear in the released table.Solution:
Randomly sort the tuples before releasing.
![Page 13: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/13.jpg)
Attacks Against K-Anonymity(Cont’d)
Zipcode
Age Disease
476** 2* Heart Disease
476** 2* Heart Disease
476** 2* Heart Disease
4790* ≥ 40 Flu
4790* ≥ 40 Heart Disease
4790* ≥ 40 Cancer
476** 3* Heart Disease
476** 3* Cancer
476** 3* Cancer
Bob
Zipcode Age
47678 27
A 3-anonymous patient table
Carl
Zipcode Age
47673 36
• k-Anonymity does not provide privacy if:Sensitive values in an equivalence class lack diversity• The attacker has background knowledgeHomogeneity Attack
Background Knowledge Attack
A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006
![Page 14: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/14.jpg)
l-Diversity
Distinct l-diversityEach equivalence class has at least l well-
represented sensitive valuesLimitation:
Example.In one equivalent class, there are ten tuples. In the “Disease” area, one of them is “Cancer”, one is “Heart Disease” and the remaining eight are “Flu”. This satisfies 3-diversity, but the attacker can still affirm that the target person’s disease is “Flu” with the accuracy of 70%.
A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006
![Page 15: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/15.jpg)
l-Diversity(Cont’d)
Entropy l-diversityEach equivalence class not only must have enough
different sensitive values, but also the different sensitive values must be distributed evenly enough.
Sometimes this maybe too restrictive. When some values are very common, the entropy of the entire table may be very low. This leads to the less conservative notion of l-diversity.
Recursive (c,l)-diversityThe most frequent value does not appear too
frequently
A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006
![Page 16: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/16.jpg)
Limitations of l-Diversity
l-diversity may be difficult and unnecessary to achieve.
• A single sensitive attribute• Two values: HIV positive (1%) and HIV
negative (99%)• Very different degrees of sensitivity
• l-diversity is unnecessary to achieve• 2-diversity is unnecessary for an
equivalence class that contains only negative records
• l-diversity is difficult to achieve• Suppose there are 10000 records in total• To have distinct 2-diversity, there can be at
most 10000*1%=100 equivalence classes
![Page 17: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/17.jpg)
Limitations of l-Diversity(Cont’d)
l-diversity is insufficient to prevent attribute disclosure.
Skewness Attack
l-diversity does not consider the overall distribution of sensitive values
• Two sensitive values• HIV positive (1%) and HIV negative (99%)
• Serious privacy risk• Consider an equivalence class that contains an equal
number of positive records and negative records• l-diversity does not differentiate:
• Equivalence class 1: 49 positive + 1 negative• Equivalence class 2: 1 positive + 49 negative
![Page 18: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/18.jpg)
Limitations of l-Diversity(Cont’d)
BobZip Age
47678 27
Zipcode
Age Salary Disease
476** 2* 3K Gastric Ulcer
476** 2* 4K Gastritis
476** 2* 5K Stomach Cancer
4790* ≥ 40 6K Gastritis
4790* ≥ 40 11K Flu
4790* ≥ 40 8K Bronchitis
476** 3* 7K Bronchitis
476** 3* 9K Pneumonia
476** 3* 10K Stomach Cancer
A 3-diverse patient table
Conclusion1. Bob’s salary is in [3k,5k], which
is relative low.2. Bob has some stomach-related
disease.
l-diversity does not consider semantic meanings of sensitive values
l-diversity is insufficient to prevent attribute disclosure.
Similarity Attack
![Page 19: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/19.jpg)
t-Closeness: A New Privacy Measure
RationaleAge Zipcode …… Gender Disease
* * …… * Flu
* * …… * Heart Disease
* * …… * Cancer
.
.
.
.
.
.
………………
.
.
.
.
.
.
* * …… * Gastritis
ExternalKnowledge
Overall distribution Q of sensitive values
Belief Knowledge
B0
B1
A completely generalized table
![Page 20: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/20.jpg)
t-Closeness: A New Privacy Measure
Rationale
ExternalKnowledge
Age Zipcode ……
Gender Disease
2* 479** ……
Male Flu
2* 479** ……
Male Heart Disease
2* 479** ……
Male Cancer
.
.
.
.
.
.
………………
.
.
.
.
.
.
≥ 50 4766* ……
* Gastritis
Overall distribution Q of sensitive values
Distribution Pi of sensitive values in each equi-class
Belief Knowledge
B0
B1
B2
A released table
![Page 21: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/21.jpg)
t-Closeness: A New Privacy Measure
Rationale
ExternalKnowledge
Overall distribution Q of sensitive values
Distribution Pi of sensitive values in each equi-class
Belief Knowledge
B0
B1
B2
• Observations• Q should be public • Knowledge gain in two parts:
• Whole population (from B0 to B1)• Specific individuals (from B1 to B2)
• We bound knowledge gain between B1 and B2 instead
• Principle• The distance between Q and Pi
should be bounded by a threshold t.
![Page 22: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/22.jpg)
How to calculate EMDEMD for numerical attributes
Ordered-distance is a metric Non-negative, symmetry, triangle inequality
Let ri=pi-qi, then D[P,Q] is calculated as:
![Page 23: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/23.jpg)
Earth Mover’s DistanceExample
{3k,4k,5k} and {3k,4k,5k,6k,7k,8k,9k,10k,11k} Move 1/9 probability for each of the following
pairs3k->5k,3k->4k cost: 1/9*(2+1)/84k->8k,4k->7k,4k->6k cost: 1/9*(4+3+2)/85k->11k,5k->10k,5k->9k cost: 1/9*(5+6+4)/8
Total cost: 1/9*27/8=0.375With P2={6k,8k,11k} , we can get the total cost
is 0.167 < 0.375. This make more sense than the other two distance calculation method.
![Page 24: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/24.jpg)
Motivating Example A hospital keeps track of the medical records collected in
the last three months. The microdata table T(1), and its generalization T*(1),
published in Apr. 2007.Name Age Zipcode DiseaseBob 21 12000 dyspepsia
Alice 22 14000 bronchitisAndy 24 18000 fluDavid 23 25000 gastritisGary 41 20000 fluHelen 36 27000 gastritisJane 37 33000 dyspepsiaKen 40 35000 flu
Linda 43 26000 gastritisPaul 52 33000 dyspepsiaSteve 56 34000 gastritis
Microdata T(1)
G. ID Age Zipcode Disease1 [21, 22] [12k, 14k] dyspepsia1 [21, 22] [12k, 14k] bronchitis2 [23, 24] [18k, 25k] flu2 [23, 24] [18k, 25k] gastritis3 [36, 41] [20k, 27k] flu3 [36, 41] [20k, 27k] gastritis4 [37, 43] [26k, 35k] dyspepsia4 [37, 43] [26k, 35k] flu4 [37, 43] [26k, 35k] gastritis5 [52, 56] [33k, 34k] dyspepsia5 [52, 56] [33k, 34k] gastritis
2-diverse Generalization T*(1)
![Page 25: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/25.jpg)
Motivating ExampleBob was hospitalized in Mar. 2007
Name Age ZipcodeBob 21 12000
G. ID Age Zipcode Disease1 [21, 22] [12k, 14k] dyspepsia1 [21, 22] [12k, 14k] bronchitis2 [23, 24] [18k, 25k] flu2 [23, 24] [18k, 25k] gastritis3 [36, 41] [20k, 27k] flu3 [36, 41] [20k, 27k] gastritis4 [37, 43] [26k, 35k] dyspepsia4 [37, 43] [26k, 35k] flu4 [37, 43] [26k, 35k] gastritis5 [52, 56] [33k, 34k] dyspepsia5 [52, 56] [33k, 34k] gastritis
2-diverse Generalization T*(1)
![Page 26: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/26.jpg)
Motivating ExampleOne month later, in May 2007
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
Alice 22 14000 bronchitisAndy 24 18000 fluDavid 23 25000 gastritisGary 41 20000 fluHelen 36 27000 gastritisJane 37 33000 dyspepsiaKen 40 35000 flu
Linda 43 26000 gastritisPaul 52 33000 dyspepsiaSteve 56 34000 gastritis
Microdata T(1)
![Page 27: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/27.jpg)
Motivating ExampleOne month later, in May 2007Some obsolete tuples are deleted from the
microdata.
Microdata T(1)
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
Alice 22 14000 bronchitisAndy 24 18000 fluDavid 23 25000 gastritisGary 41 20000 fluHelen 36 27000 gastritisJane 37 33000 dyspepsiaKen 40 35000 flu
Linda 43 26000 gastritisPaul 52 33000 dyspepsiaSteve 56 34000 gastritis
![Page 28: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/28.jpg)
Motivating ExampleBob’s tuple stays.
Microdata T(1)
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
David 23 25000 gastritisGary 41 20000 fluJane 37 33000 dyspepsia
Linda 43 26000 gastritisSteve 56 34000 gastritis
![Page 29: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/29.jpg)
Motivating ExampleSome new records are inserted.
Microdata T(2)
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
David 23 25000 gastritisEmily 25 21000 fluJane 37 33000 dyspepsia
Linda 43 26000 gastritisGary 41 20000 fluMary 46 30000 gastritisRay 54 31000 dyspepsia
Steve 56 34000 gastritisTom 60 44000 gastritis
Vince 65 36000 flu
![Page 30: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/30.jpg)
Motivating ExampleThe hospital published T*(2).
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
David 23 25000 gastritisEmily 25 21000 fluJane 37 33000 dyspepsia
Linda 43 26000 gastritisGary 41 20000 fluMary 46 30000 gastritisRay 54 31000 dyspepsia
Steve 56 34000 gastritisTom 60 44000 gastritis
Vince 65 36000 flu
Microdata T(2)
G. ID Age Zipcode Disease1 [21, 23] [12k, 25k] dyspepsia1 [21, 23] [12k, 25k] gastritis2 [25, 43] [21k, 33k] flu2 [25, 43] [21k, 33k] dyspepsia3 [25, 43] [21k, 33k] gastritis3 [41, 46] [20k, 30k] flu4 [41, 46] [20k, 30k] gastritis4 [54, 56] [31k, 34k] dyspepsia4 [54, 56] [31k, 34k] gastritis5 [60, 65] [36k, 44k] gastritis5 [60, 65] [36k, 44k] flu
2-diverse Generalization T*(2)
![Page 31: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/31.jpg)
Motivating ExampleConsider the previous adversary.
Name Age ZipcodeBob 21 12000
G. ID Age Zipcode Disease1 [21, 23] [12k, 25k] dyspepsia1 [21, 23] [12k, 25k] gastritis2 [25, 43] [21k, 33k] flu2 [25, 43] [21k, 33k] dyspepsia3 [25, 43] [21k, 33k] gastritis3 [41, 46] [20k, 30k] flu4 [41, 46] [20k, 30k] gastritis4 [54, 56] [31k, 34k] dyspepsia4 [54, 56] [31k, 34k] gastritis5 [60, 65] [36k, 44k] gastritis5 [60, 65] [36k, 44k] flu
2-diverse Generalization T*(2)
![Page 32: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/32.jpg)
Motivating ExampleWhat the adversary learns from T*(1).
What the adversary learns from T*(2).
So Bob must have contracted dyspepsia!A new generalization principle is needed.
Name Age ZipcodeBob 21 12000
G. ID Age Zipcode Disease1 [21, 22] [12k, 14k] dyspepsia1 [21, 22] [12k, 14k] bronchitis
……
Name Age ZipcodeBob 21 12000
G. ID Age Zipcode Disease1 [21, 23] [12k, 25k] dyspepsia1 [21, 23] [12k, 25k] gastritis
……
![Page 33: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/33.jpg)
The critical absence phenomenon
We refer to such phenomenon as the critical absence phenomenon
A new generalization method is needed.
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
David 23 25000 gastritisEmily 25 21000 fluJane 37 33000 dyspepsia
Linda 43 26000 gastritisGary 41 20000 fluMary 46 30000 gastritisRay 54 31000 dyspepsia
Steve 56 34000 gastritisTom 60 44000 gastritis
Vince 65 36000 flu
Microdata T(2)
Name Age ZipcodeBob 21 12000
G. ID Age Zipcode Disease1 [21, 22] [12k, 14k] dyspepsia1 [21, 22] [12k, 14k] bronchitis
……
What the adversary learns
from T*(1)
![Page 34: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/34.jpg)
Name Group-ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsiac1 1 [21, 22] [12k, 14k] bronchitis
David 2 [23, 25] [21k, 25k] gastritisEmily 2 [23, 25] [21k, 25k] fluJane 3 [37, 43] [26k, 33k] dyspepsiac2 3 [37, 43] [26k, 33k] flu
Linda 3 [37, 43] [26k, 33k] gastritisGary 4 [41, 46] [20k, 30k] fluMary 4 [41, 46] [20k, 30k] gastritisRay 5 [54, 56] [31k, 34k] dyspepsia
Steve 5 [54, 56] [31k, 34k] gastritisTom 6 [60, 65] [36k, 44k] gastritis
Vince 6 [60, 65] [36k, 44k] flu
Counterfeited generalization T*(2)
Group-ID Count
1 13 1
The auxiliary relation R(2) for T*(2)
Name Age Zipcode DiseaseBob 21 12000 dyspepsia
David 23 25000 gastritisEmily 25 21000 fluJane 37 33000 dyspepsia
Linda 43 26000 gastritisGary 41 20000 fluMary 46 30000 gastritisRay 54 31000 dyspepsia
Steve 56 34000 gastritisTom 60 44000 gastritis
Vince 65 36000 flu
Microdata T(2)
![Page 35: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/35.jpg)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsiac1 1 [21, 22] [12k, 14k] bronchitis
David 2 [23, 25] [21k, 25k] gastritisEmily 2 [23, 25] [21k, 25k] fluJane 3 [37, 43] [26k, 33k] dyspepsiac2 3 [37, 43] [26k, 33k] flu
Linda 3 [37, 43] [26k, 33k] gastritisGary 4 [41, 46] [20k, 30k] fluMary 4 [41, 46] [20k, 30k] gastritisRay 5 [54, 56] [31k, 34k] dyspepsia
Steve 5 [54, 56] [31k, 34k] gastritisTom 6 [60, 65] [36k, 44k] gastritis
Vince 6 [60, 65] [36k, 44k] flu
Counterfeited Generalization T*(2)
Group-ID Count
1 13 1
The auxiliary relation R(2) for T*(2)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsia
Alice 1 [21, 22] [12k, 14k] bronchitisAndy 2 [23, 24] [18k, 25k] fluDavid 2 [23, 24] [18k, 25k] gastritisGary 3 [36, 41] [20k, 27k] fluHelen 3 [36, 41] [20k, 27k] gastritisJane 4 [37, 43] [26k, 35k] dyspepsiaKen 4 [37, 43] [26k, 35k] flu
Linda 4 [37, 43] [26k, 35k] gastritisPaul 5 [52, 56] [33k, 34k] dyspepsiaSteve 5 [52, 56] [33k, 34k] gastritis
Generalization T*(1)
Name Age ZipcodeBob 21 12000
![Page 36: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/36.jpg)
m-uniquenessA generalized table T*(j) is m-unique, if and only if
each QI-group in T*(j) contains at least m tuplesall tuples in the same QI-group have different sensitive
values.G. ID Age Zipcode Disease
1 [21, 22] [12k, 14k] dyspepsia1 [21, 22] [12k, 14k] bronchitis2 [23, 24] [18k, 25k] flu2 [23, 24] [18k, 25k] gastritis3 [36, 41] [20k, 27k] flu3 [36, 41] [20k, 27k] gastritis4 [37, 43] [26k, 35k] dyspepsia4 [37, 43] [26k, 35k] flu4 [37, 43] [26k, 35k] gastritis5 [52, 56] [33k, 34k] dyspepsia5 [52, 56] [33k, 34k] gastritis
A 2-unique generalized table
![Page 37: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/37.jpg)
Signature
The signature of Bob in T*(1) is {dyspepsia, bronchitis}
The signature of Jane in T*(1) is {dyspepsia, flu, gastritis}
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsia
Alice 1 [21, 22] [12k, 14k] bronchitis… … … … …
Jane 4 [37, 43] [26k, 35k] dyspepsiaKen 4 [37, 43] [26k, 35k] flu
Linda 4 [37, 43] [26k, 35k] gastritis… … … … …
T*(1)
![Page 38: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/38.jpg)
The m-invariance principleA sequence of generalized tables T*(1), …,
T*(n) is m-invariant, if and only ifT*(1), …, T*(n) are m-unique, andeach individual has the same signature in
every generalized table s/he is involved.
![Page 39: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/39.jpg)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsiac1 1 [21, 22] [12k, 14k] bronchitis
David 2 [23, 25] [21k, 25k] gastritisEmily 2 [23, 25] [21k, 25k] fluJane 3 [37, 43] [26k, 33k] dyspepsiac2 3 [37, 43] [26k, 33k] flu
Linda 3 [37, 43] [26k, 33k] gastritisGary 4 [41, 46] [20k, 30k] fluMary 4 [41, 46] [20k, 30k] gastritisRay 5 [54, 56] [31k, 34k] dyspepsia
Steve 5 [54, 56] [31k, 34k] gastritisTom 6 [60, 65] [36k, 44k] gastritis
Vince 6 [60, 65] [36k, 44k] flu
Generalization T*(2)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsia
Alice 1 [21, 22] [12k, 14k] bronchitisAndy 2 [23, 24] [18k, 25k] fluDavid 2 [23, 24] [18k, 25k] gastritisGary 3 [36, 41] [20k, 27k] fluHelen 3 [36, 41] [20k, 27k] gastritisJane 4 [37, 43] [26k, 35k] dyspepsiaKen 4 [37, 43] [26k, 35k] flu
Linda 4 [37, 43] [26k, 35k] gastritisPaul 5 [52, 56] [33k, 34k] dyspepsiaSteve 5 [52, 56] [33k, 34k] gastritis
Generalization T*(1)
A sequence of generalized tables T*(1), …, T*(n) is m-invariant, if and only ifT*(1), …, T*(n) are m-unique, andeach individual has the same signature in every
generalized table s/he is involved.
![Page 40: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/40.jpg)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsiac1 1 [21, 22] [12k, 14k] bronchitis
David 2 [23, 25] [21k, 25k] gastritisEmily 2 [23, 25] [21k, 25k] fluJane 3 [37, 43] [26k, 33k] dyspepsiac2 3 [37, 43] [26k, 33k] flu
Linda 3 [37, 43] [26k, 33k] gastritisGary 4 [41, 46] [20k, 30k] fluMary 4 [41, 46] [20k, 30k] gastritisRay 5 [54, 56] [31k, 34k] dyspepsia
Steve 5 [54, 56] [31k, 34k] gastritisTom 6 [60, 65] [36k, 44k] gastritis
Vince 6 [60, 65] [36k, 44k] flu
Generalization T*(2)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsia
Alice 1 [21, 22] [12k, 14k] bronchitisAndy 2 [23, 24] [18k, 25k] fluDavid 2 [23, 24] [18k, 25k] gastritisGary 3 [36, 41] [20k, 27k] fluHelen 3 [36, 41] [20k, 27k] gastritisJane 4 [37, 43] [26k, 35k] dyspepsiaKen 4 [37, 43] [26k, 35k] flu
Linda 4 [37, 43] [26k, 35k] gastritisPaul 5 [52, 56] [33k, 34k] dyspepsiaSteve 5 [52, 56] [33k, 34k] gastritis
Generalization T*(1)
A sequence of generalized tables T*(1), …, T*(n) is m-invariant, if and only ifT*(1), …, T*(n) are m-unique, andeach individual has the same signature in every
generalized table s/he is involved.
![Page 41: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/41.jpg)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsiac1 1 [21, 22] [12k, 14k] bronchitis
David 2 [23, 25] [21k, 25k] gastritisEmily 2 [23, 25] [21k, 25k] fluJane 3 [37, 43] [26k, 33k] dyspepsiac2 3 [37, 43] [26k, 33k] flu
Linda 3 [37, 43] [26k, 33k] gastritisGary 4 [41, 46] [20k, 30k] fluMary 4 [41, 46] [20k, 30k] gastritisRay 5 [54, 56] [31k, 34k] dyspepsia
Steve 5 [54, 56] [31k, 34k] gastritisTom 6 [60, 65] [36k, 44k] gastritis
Vince 6 [60, 65] [36k, 44k] flu
Generalization T*(2)
Name G.ID Age Zipcode DiseaseBob 1 [21, 22] [12k, 14k] dyspepsia
Alice 1 [21, 22] [12k, 14k] bronchitisAndy 2 [23, 24] [18k, 25k] fluDavid 2 [23, 24] [18k, 25k] gastritisGary 3 [36, 41] [20k, 27k] fluHelen 3 [36, 41] [20k, 27k] gastritisJane 4 [37, 43] [26k, 35k] dyspepsiaKen 4 [37, 43] [26k, 35k] flu
Linda 4 [37, 43] [26k, 35k] gastritisPaul 5 [52, 56] [33k, 34k] dyspepsiaSteve 5 [52, 56] [33k, 34k] gastritis
Generalization T*(1)
A sequence of generalized tables T*(1), …, T*(n) is m-invariant, if and only ifT*(1), …, T*(n) are m-unique, andeach individual has the same signature in every
generalized table s/he is involved.
![Page 42: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/42.jpg)
Motivation 1: Personalization Andy does not want anyone to know that he had a stomach
problem Sarah does not mind at all if others find out that she had flu
Name Age Sex ZipcodeAndy 4 M 12000Bill 5 M 14000Ken 6 M 18000Nash 9 M 19000Mike 7 M 17000Alice 12 F 22000Betty 19 F 24000Linda 21 F 33000Jane 25 F 34000
Sarah 28 F 37000Mary 56 F 58000
Age Sex Zipcode Disease[1, 5] M [10001, 15000] gastric ulcer[1, 5] M [10001, 15000] dyspepsia
[6, 10] M [15001, 20000] pneumonia[6, 10] M [15001, 20000] bronchitis
[11, 20] F [20001, 25000] flu[11, 20] F [20001, 25000] pneumonia[21, 60] F [30001, 60000] gastritis[21, 60] F [30001, 60000] gastritis[21, 60] F [30001, 60000] flu[21, 60] F [30001, 60000] flu
A 2-diverse table An external database
![Page 43: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/43.jpg)
Motivation 2: SA generalization How many female patients are there with age above 30? 4 ∙ (60 – 30 ) / (60 – 20 ) = 3 Real answer: 1
Age Sex Zipcode Disease[1, 5] M [10001, 15000] gastric ulcer[1, 5] M [10001, 15000] dyspepsia
[6, 10] M [15001, 20000] pneumonia[6, 10] M [15001, 20000] bronchitis
[11, 20] F [20001, 25000] flu[11, 20] F [20001, 25000] pneumonia[21, 60] F [30001, 60000] gastritis[21, 60] F [30001, 60000] gastritis[21, 60] F [30001, 60000] flu[21, 60] F [30001, 60000] flu
A generalized tableName Age Sex ZipcodeAndy 4 M 12000Bill 5 M 14000Ken 6 M 18000Nash 9 M 19000Mike 7 M 17000Alice 12 F 22000Betty 19 F 24000Linda 21 F 33000Jane 25 F 34000
Sarah 28 F 37000Mary 56 F 58000
An external database
![Page 44: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/44.jpg)
Motivation 2: SA generalization (cont.) Generalization of the sensitive attribute is beneficial in this
case
Age Sex Zipcode Disease[1, 5] M [10001, 15000] gastric ulcer[1, 5] M [10001, 15000] dyspepsia
[6, 10] M [15001, 20000] pneumonia[6, 10] M [15001, 20000] bronchitis
[11, 20] F [20001, 25000] flu[11, 20] F [20001, 25000] pneumonia[21, 30] F [30001, 40000] gastritis[21, 30] F [30001, 40000] gastritis[21, 30] F [30001, 40000] flu
56 F 58000respiratory infection
A better generalized tableName Age Sex ZipcodeAndy 4 M 12000Bill 5 M 14000Ken 6 M 18000Nash 9 M 19000Mike 7 M 17000Alice 12 F 22000Betty 19 F 24000Linda 21 F 33000Jane 25 F 34000
Sarah 28 F 37000Mary 56 F 58000
An external database
![Page 45: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/45.jpg)
Personalized anonymityWe propose
a mechanism to capture personalized privacy requirements
criteria for measuring the degree of security provided by a generalized table
![Page 46: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/46.jpg)
Guarding nodeany illness
stomach diseaserespiratory infection
flu pneumonia gastricbronchitis dyspepsia
respiratory system problem digestive system problem
gastritisulcer
Andy does not want anyone to know that he had a stomach problem
He can specify “stomach disease” as the guarding node for his tuple
The data publisher should prevent an adversary from associating Andy with “stomach disease”
Name Age Sex Zipcode Disease guarding node
Andy 4 M 12000 gastric ulcer stomach disease
![Page 47: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/47.jpg)
Guarding nodeany illness
stomach diseaserespiratory infection
flu pneumonia gastricbronchitis dyspepsia
respiratory system problem digestive system problem
gastritisulcer
Sarah is willing to disclose her exact symptom She can specify Ø as the guarding node for her tuple
Name Age Sex Zipcode Disease guarding node
Sarah 28 F 37000 flu Ø
![Page 48: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/48.jpg)
Guarding nodeany illness
stomach diseaserespiratory infection
flu pneumonia gastricbronchitis dyspepsia
respiratory system problem digestive system problem
gastritisulcer
Bill does not have any special preference He can specify the guarding node for his tuple as the same
with his sensitive value
Name Age Sex Zipcode Disease guarding node
Bill 5 M 14000 dyspepsia dyspepsia
![Page 49: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/49.jpg)
A personalized approachany illness
stomach diseaserespiratory infection
flu pneumonia gastricbronchitis dyspepsia
respiratory system problem digestive system problem
gastritisulcer
Name Age Sex Zipcode Disease guarding nodeAndy 4 M 12000 gastric ulcer stomach diseaseBill 5 M 14000 dyspepsia dyspepsiaKen 6 M 18000 pneumonia respiratory infectionNash 9 M 19000 bronchitis bronchitisAlice 12 F 22000 flu fluBetty 19 F 24000 pneumonia pneumoniaLinda 21 F 33000 gastritis gastritisJane 25 F 34000 gastritis Ø
Sarah 28 F 37000 flu ØMary 56 F 58000 flu flu
![Page 50: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/50.jpg)
Personalized anonymity
A table satisfies personalized anonymity with a parameter pbreach Iff no adversary can breach the privacy requirement of any tuple with a
probability above pbreach
If pbreach = 0.3, then any adversary should have no more than 30% probability to find out that: Andy had a stomach disease Bill had dyspepsia etc
Name Age Sex Zipcode Disease guarding nodeAndy 4 M 12000 gastric ulcer stomach diseaseBill 5 M 14000 dyspepsia dyspepsiaKen 6 M 18000 pneumonia respiratory infectionNash 9 M 19000 bronchitis bronchitisAlice 12 F 22000 flu fluBetty 19 F 24000 pneumonia pneumoniaLinda 21 F 33000 gastritis gastritisJane 25 F 34000 gastritis Ø
Sarah 28 F 37000 flu ØMary 56 F 58000 flu flu
![Page 51: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/51.jpg)
Personalized anonymityPersonalized anonymity with respect to a
predefined parameter pbreachan adversary can breach the privacy requirement of any
tuple with a probability at most pbreach
Age Sex Zipcode Disease[1, 10] M [10001, 20000] gastric ulcer[1, 10] M [10001, 20000] dyspepsia[1, 10] M [10001, 20000] pneumonia[1, 10] M [10001, 20000] bronchitis
[11, 20] F [20001, 25000] flu[11, 20] F [20001, 25000] pneumonia
21 F 33000 stomach disease25 F 34000 gastritis28 F 37000 flu56 F 58000 respiratory infection
• We need a method for calculating the breach probabilities
What is the probability that Andy had some stomach problem?
![Page 52: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/52.jpg)
Combinatorial reconstructionAssumptions
the adversary has no prior knowledge about each individual
every individual involved in the microdata also appears in the external database
![Page 53: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/53.jpg)
Combinatorial reconstructionAndy does not want anyone to know that he had
some stomach problemWhat is the probability that the adversary can find
out that “Andy had a stomach disease”?
Name Age Sex ZipcodeAndy 4 M 12000Bill 5 M 14000Ken 6 M 18000Nash 9 M 19000Mike 7 M 17000Alice 12 F 22000Betty 19 F 24000Linda 21 F 33000Jane 25 F 34000
Sarah 28 F 37000Mary 56 F 58000
Age Sex Zipcode Disease[1, 10] M [10001, 20000] gastric ulcer[1, 10] M [10001, 20000] dyspepsia[1, 10] M [10001, 20000] pneumonia[1, 10] M [10001, 20000] bronchitis[11, 20] F [20001, 25000] flu[11, 20] F [20001, 25000] pneumonia
21 F 33000 stomach disease25 F 34000 gastritis28 F 37000 flu56 F 58000 respiratory infection
![Page 54: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/54.jpg)
Combinatorial reconstruction (cont.)
Can each individual appear more than once?No = the primary caseYes = the non-primary case
Some possible reconstructions:
AndyBillKenNashMike
gastric ulcerdyspepsiapneumoniabronchitis
the primary case
AndyBillKenNashMike
gastric ulcerdyspepsiapneumoniabronchitis
the non-primary case
![Page 55: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/55.jpg)
Combinatorial reconstruction (cont.)
Can each individual appear more than once?No = the primary caseYes = the non-primary case
Some possible reconstructions:
AndyBillKenNashMike
gastric ulcerdyspepsiapneumoniabronchitis
the primary case
AndyBillKenNashMike
gastric ulcerdyspepsiapneumoniabronchitis
the non-primary case
![Page 56: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/56.jpg)
Breach probability (primary)
Totally 120 possible reconstructions If Andy is associated with a stomach disease in nb
reconstructions The probability that the adversary should associate Andy with
some stomach problem is nb / 120
Andy is associated withgastric ulcer in 24 reconstructionsdyspepsia in 24 reconstructionsgastritis in 0 reconstructions
nb = 48 The breach probability for Andy’s tuple is 48 / 120 = 2 / 5
any illness
stomach diseaserespiratory infection
flu pneumonia gastricbronchitis dyspepsia
respiratory system problem digestive system problem
gastritisulcer
AndyBillKenNashMike
gastric ulcerdyspepsiapneumoniabronchitis
![Page 57: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/57.jpg)
Breach probability (non-primary)
Totally 625 possible reconstructionsAndy is associated with gastric ulcer or
dyspepsia or gastritis in 225 reconstructions
nb = 225The breach probability for Andy’s tuple is
225 / 625 = 9 / 25
any illness
stomach diseaserespiratory infection
flu pneumonia gastricbronchitis dyspepsia
respiratory system problem digestive system problem
gastritisulcer
AndyBillKenNashMike
gastric ulcerdyspepsiapneumoniabronchitis
![Page 58: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/58.jpg)
Defect of generalization Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia[21, 60] M [10001, 60000] dyspepsia[21, 60] M [10001, 60000] dyspepsia[21, 60] M [10001, 60000] pneumonia[61, 70] F [10001, 60000] flu[61, 70] F [10001, 60000] gastritis[61, 70] F [10001, 60000] flu[61, 70] F [10001, 60000] bronchitis
• Estimated answer: 2 * p, where p is the probability that each of the two tuples satisfies the query conditions
![Page 59: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/59.jpg)
Defect of generalization (cont.)Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
p = Area( R1 ∩ Q ) / Area( R1 ) = 0.05Estimated answer for query A: 2 * p = 0.1
20
10k
7060504030
60k
50k
40k
30k
20k
AgeZ
ipco
de
Q
R1
Age Sex Zipcode Disease[21, 60] M [10001, 60000] pneumonia[21, 60] M [10001, 60000] pneumonia
![Page 60: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/60.jpg)
Defect of generalization (cont.)Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]Estimated answer from the generalized table: 0.1
Name Age Sex Zipcode DiseaseBob 23 M 11000 pneumoniaKen 27 M 13000 dyspepsiaPeter 35 M 59000 dyspepsiaSam 59 M 12000 pneumoniaJane 61 F 54000 flu
Linda 65 F 25000 gastritisAlice 65 F 25000 flu
Mandy 70 F 30000 bronchitis
• The exact answer should be: 1
![Page 61: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/61.jpg)
Basic Idea of AnatomyFor a given microdata table, Anatomy releases a
quasi-identifier table (QIT) and a sensitive table (ST)
Group-ID Disease Count1 dyspepsia 21 pneumonia 22 bronchitis 12 flu 22 gastritis 1
Age Sex Zipcode Group-ID23 M 11000 127 M 13000 135 M 59000 159 M 12000 161 F 54000 265 F 25000 265 F 25000 270 F 30000 2
Quasi-identifier Table (QIT)
Sensitive Table (ST)
Age Sex Zipcode Disease23 M 11000 pneumonia27 M 13000 dyspepsia35 M 59000 dyspepsia59 M 12000 pneumonia61 F 54000 flu65 F 25000 gastritis65 F 25000 flu70 F 30000 bronchitis
microdata
![Page 62: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/62.jpg)
Basic Idea of Anatomy (cont.)1. Select a partition of the tuples
Age Sex Zipcode Disease
23 M 11000 pneumonia27 M 13000 dyspepsia35 M 59000 dyspepsia59 M 12000 pneumonia
61 F 54000 flu65 F 25000 gastritis65 F 25000 flu70 F 30000 bronchitis
QI group 1
QI group 2
a 2-diverse partition
![Page 63: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/63.jpg)
Basic Idea of Anatomy (cont.)2. Generate a quasi-idnetifier table (QIT) and a
sensitive table (ST) based on the selected partition
Disease
pneumoniadyspepsiadyspepsia
pneumonia
flugastritis
flubronchitis
Age Sex Zipcode
23 M 1100027 M 1300035 M 5900059 M 12000
61 F 5400065 F 2500065 F 2500070 F 30000
group 1
group 2
quasi-identifier table (QIT) sensitive table (ST)
![Page 64: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/64.jpg)
Basic Idea of Anatomy (cont.)2. Generate a quasi-idnetifier table (QIT) and a
sensitive table (ST) based on the selected partition
Group-ID Disease
1 pneumonia1 dyspepsia1 dyspepsia1 pneumonia
2 flu2 gastritis2 flu2 bronchitis
Age Sex Zipcode Group-ID
23 M 11000 127 M 13000 135 M 59000 159 M 12000 1
61 F 54000 265 F 25000 265 F 25000 270 F 30000 2
quasi-identifier table (QIT) sensitive table (ST)
![Page 65: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/65.jpg)
Basic Idea of Anatomy (cont.)2. Generate a quasi-idnetifier table (QIT) and a
sensitive table (ST) based on the selected partition
Group-ID Disease Count1 dyspepsia 21 pneumonia 22 bronchitis 12 flu 22 gastritis 1
Age Sex Zipcode Group-ID23 M 11000 127 M 13000 135 M 59000 159 M 12000 161 F 54000 265 F 25000 265 F 25000 270 F 30000 2
quasi-identifier table (QIT)
sensitive table (ST)
![Page 66: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/66.jpg)
Privacy PreservationFrom a pair of QIT and ST generated from an l-
diverse partition, the adversary can infer the sensitive value of each individual with confidence at most 1/lName Age Sex Zipcode
Bob 23 M 11000
Group-ID Disease Count1 dyspepsia 21 pneumonia 22 bronchitis 12 flu 22 gastritis 1
Age Sex Zipcode Group-ID23 M 11000 127 M 13000 135 M 59000 159 M 12000 161 F 54000 265 F 25000 265 F 25000 270 F 30000 2quasi-identifier table (QIT)
sensitive table (ST)
![Page 67: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/67.jpg)
Accuracy of Data Analysis Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]Group-ID Disease Count
1 dyspepsia 21 pneumonia 22 bronchitis 12 flu 22 gastritis 1
Age Sex Zipcode Group-ID23 M 11000 127 M 13000 135 M 59000 159 M 12000 161 F 54000 265 F 25000 265 F 25000 270 F 30000 2quasi-identifier table (QIT)
sensitive table (ST)
![Page 68: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/68.jpg)
Accuracy of Data Analysis (cont.)Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
2 patients have contracted pneumonia2 out of 4 patients satisfies the query condition on Age and
ZipcodeEstimated answer for query A: 2 * 2 / 4 = 1, which is also the
actual result from the original microdata
20
10k
7060504030
60k
50k
40k
30k
20k
x (Age)y
(Zip
code
)
t1
Q
t2
t3
t4
Age Sex Zipcode Group-ID23 M 11000 127 M 13000 135 M 59000 159 M 12000 1
t1t2t3t4
![Page 69: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/69.jpg)
ConclusionLimitations of l-diversity
l-diversity is difficult and unnecessary to achieve
l-diversity is insufficient in preventing attribute disclosure
t-Closeness as a new privacy measureThe overall distribution of sensitive values
should be public informationThe separation of the knowledge gain
EMD to measure distanceEMD captures semantic distance wellSimple formulas for three ground distances
![Page 70: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/70.jpg)
Conclusionsm-invariant table support republication
of dynamic datasetsGuarding nodes allow individuals to
describe their privacy requirements better
Anatomy outperforms generalization by allowing much more accurate data analysis on the published data.
![Page 71: Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data](https://reader036.vdocument.in/reader036/viewer/2022062517/56649eab5503460f94bb1936/html5/thumbnails/71.jpg)
Thank you!
Questions?