school of computing university of dundee · a. haeberlen motivation: protecting privacy 2 usenix...
TRANSCRIPT
![Page 1: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/1.jpg)
Marco GaboardiSchool of ComputingUniversity of Dundee
Differential privacy
![Page 2: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/2.jpg)
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
![Page 3: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/3.jpg)
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
medicalcorrelation?
queryanswer
![Page 4: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/4.jpg)
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
answer
query
Does Joe have
cancer? I know he traveled toAtlantis...
![Page 5: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/5.jpg)
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Does Joe have
cancer?
![Page 6: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/6.jpg)
Attacker
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
![Page 7: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/7.jpg)
A classic case...
![Page 8: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/8.jpg)
Adding statistical noise
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
![Page 9: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/9.jpg)
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
query
answer + noise
medicalcorrelation?
Adding statistical noise
![Page 10: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/10.jpg)
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
answer + noise
query
?!?
Adding statistical noise
I know he traveled toAtlantis...
![Page 11: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/11.jpg)
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
?!?
Adding statistical noise
![Page 12: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/12.jpg)
The promise of Differential Privacy
“You will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies, data sets, or information sources, are available.”
C. Dwork, A. Roth 2014
![Page 13: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/13.jpg)
(ε,δ)-Differential Privacy
DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
![Page 14: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/14.jpg)
(ε,δ)-Differential Privacy
DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
A query returning an answer with noise
![Page 15: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/15.jpg)
(ε,δ)-Differential Privacy
DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
Privacy parameters
![Page 16: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/16.jpg)
(ε,δ)-Differential Privacy
DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
a notion of individual data we want to protect
![Page 17: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/17.jpg)
(ε,δ)-Differential Privacy
DefinitionGiven ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
and over all the possible “bad” outcomes
![Page 18: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/18.jpg)
A way to achieve it: adding noise
• Suppose that we have x1,..,xn ∈[0,1]
• we want to compute mean(x1,..,xn) = μ
• Release a noised version
μ° = μ + Laplace noise ~ ε
![Page 19: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/19.jpg)
• Suppose that we have x1,..,xn ∈[0,1]
• we want to compute mean(x1,..,xn) = μ
• Release a noised version
μ° = μ + Laplace noise ~ ε
ε-differentially private
A way to achieve it: adding noise
![Page 20: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/20.jpg)
Q(b) Q(b∪{x})
ε-Differential Privacy
![Page 21: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/21.jpg)
Probability of a bad event
Bad Event
Pr[Q(b∪{x})∈S] Pr[Q(b) ∈ S]
log ≤ε
![Page 22: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/22.jpg)
Probability of a bad event
Bad Event
Dataset bad event
without my data 16%with my data 16.8%
![Page 23: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/23.jpg)
Bad Event
(ε,δ)-Differential Privacy
Overall ε is agood bound.
![Page 24: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/24.jpg)
Bad Event
(ε,δ)-Differential Privacy
Overall ε is agood bound.
it can failwith probability δ.
![Page 25: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/25.jpg)
Noise vs Accuracy• We run in the first place a data analysis for
having an accurate answer,
• We have several ways to estimate how good is our answer. Two examples:
- by comparing the answer with the output of the noise-free computation
- by comparing the answer with the expected value of the query on the population
![Page 26: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/26.jpg)
Accuracy of the analysis60% 10.2±0.5
80% 10.2±1
90% 10.2±2
(α,β)-accurate
![Page 27: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/27.jpg)
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
M(ε,δ)
Privacy-Accuracy trade-off
![Page 28: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/28.jpg)
A simple DP mechanism
query ε-differentially privatequery
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Laplace Noise ~ ε
![Page 29: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/29.jpg)
Composition• We can easily compose differentially private data
analysis with the guarantee degrading gracefully.
ε-DP Query
ε-DP Query
ε-DP Query...
.....
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
![Page 30: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/30.jpg)
n*ε-DP Query
Composition• We can easily compose differentially private data
analysis with the guarantee degrading gracefully.
ε-DP Query
ε-DP Query
ε-DP Query...
.....
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
![Page 31: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/31.jpg)
Composition• We can easily compose differentially private data
analysis with the guarantee degrading gracefully.
ε-DP Query
ε-DP Query
ε-DP Query
......
..
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
![Page 32: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/32.jpg)
ε-DP Query
Composition• We can easily compose differentially private data
analysis with the guarantee degrading gracefully.
ε-DP Query
ε-DP Query
ε-DP Query
......
..
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
![Page 33: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/33.jpg)
Linking different data
![Page 34: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/34.jpg)
Tools• The composition properties permit to build
tools for automatic ensuring differential privacy:
- PINQ (Microsoft Research)
- Airavat (U. of Texas)
- Fuzz (U. of Pennsylvania)
- CertiPriv (Imdea Software Madrid)
- Gupt (Berkeley U.)
![Page 35: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/35.jpg)
Naturaldifferential privacy?
![Page 36: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/36.jpg)
The role of ε and δ• We have that ε is:
- a bound on the increase probability of a bad event to happen if I decide to participate in a study,
- an overall privacy budget for a given collection of data
• Instead, δ is probability that the analysis will fail to provide me the guarantee
![Page 37: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/37.jpg)
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
M(ε,δ)
Privacy-Accuracy trade-off
![Page 38: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/38.jpg)
The analyst’s view
• The mechanism has an error probability AM(N,ε,δ)
• Goal: obtain an answer with error probability smaller than a target α:
• don’t exceed the budget B.
AM(N,ε,δ) ≤ α
![Page 39: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/39.jpg)
The individual’s view
• He has a cost function f on the output of M,
• He has a worst-case cost W
• He is compensated with S
• goal: participating if he is compensated for the cost.
(eε-1)Ef + δW≤ S
![Page 40: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/40.jpg)
The individual’s view
• He has a cost function f on the output of M,
• He has a worst-case cost W
• He is compensated with S
• goal: participating if he is compensated for the cost.
(eε-1)Ef + δW≤ SIncreasing in the cost if he
participates
![Page 41: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/41.jpg)
The individual’s view
• He has a cost function f on the output of M,
• He has a worst-case cost W
• He is compensated with S
• goal: participating if he is compensated for the cost.
(eε-1)Ef + δW≤ SIncreasing in the cost if he
participates
Cost in the case of failure
![Page 42: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/42.jpg)
Combining the two views
(eε-1)EfN+ δWN≤ B
AM(N,ε,δ) ≤ α{e
N
B
a
Figure 2: Feasible e,N, for accuracy a and budget B.
For the budget constraint, let the budget be B = 4.5⇥ 105. Toestimate each individual’s base cost, we need to reason about theindividual’s costs that might be affected by this study.
For the sake of example, suppose that the health insurance com-pany does not know that the individual smokes, and the individualis afraid that the insurance company will discover this and raise herpremiums. Taking some average figures, the average health insur-ance premium for smokers is $1274 more, compared to nonsmok-ers [20]. Thus, some participants fear a price increase of $1274.10
To calculate how much individuals should be compensated, wereason about the probability that the insurance company finds outthat an individual smokers, even if she does not participate in thestudy. This is not impossible—perhaps an insurance agency em-ployee observes the individual smoking outside. However, it alsois not very likely—insurance agencies generally do not spy on in-dividuals, trying to catch smokers.
So, suppose the participants think there is a moderate, 20%chance that the insurance company finds out that they are smok-ers, even if they do not participate.11 Thus, we can upper boundthe base cost by E = 0.20 ·1274 = 254.8—this is the cost the par-ticipants expect, even if they do not participate.
Plugging these numbers in, Equation (8) is satisfied and the studyis feasible. For instance, e = 8.4⇥ 10�4 and N = 2⇥ 106 satisfythe original accuracy and budget constraints; each participant iscompensated (ee � 1) ·E = $0.22, for a total cost of 4.4⇥ 105 <B = 4.5⇥105.
Figure 1 gives a pictorial representation of the constraints. Fora fixed accuracy a , the blue curve (marked a) contains values ofe,N that achieve error a . The blue shaded region (above the acurve) shows points that are feasible for that accuracy—there, e,Ngive accuracy better than a . The red curve (marked B) and redshaded region (below the B curve) show the same thing for a fixedbudget B. The intersection of the two regions (the purple area)contains values of e,N that satisfy both the accuracy constraint,and the budget constraint. Figure ?? shows the curves for differentset values of a and B.
10Note that it would not make sense to include the participant’s cur-rent health insurance cost as part of the bad event—participating ina study will not make it more likely that participants need to payfor health insurance (presumably, they are already paying for it).Rather, participating in a study may lead to a payment increase,which is the bad event.
11Presumably, nonsmokers would expect much lower cost. Theyshould not expect zero cost, since there is always a chance that theinsurance company wrongly labels them a smoker, but this happenswith low probability.
e
N
B1
B2
a2
a1
Figure 3: Constant accuracy curves for a1 < a2, constant bud-
get curves for B1 < B2
A more realistic example: answering many queries Theprevious example has a significant drawback in that the mechanismonly answers a single query. Any realistic study, medical or oth-erwise, will need many more queries. In this section, we will seehow to carry out calculations for a more sophisticated algorithmfrom the privacy literature: the multiplicative weights exponentialmechanism (MWEM) [13, 14].
MWEM is a mechanism that can answer a large number (ex-ponential in N) of counting queries: queries of the form “Whatfraction of the records in the database satisfy property P?” For aconcrete example, suppose that the space of records is bit stringsof length 20, i.e. X = {0,1}20. Each individual’s bit string canbe thought of as a list of attributes: the first bit might encode thegender, the second bit might encode the smoking status, the thirdbit might encode whether the age is above 50 or not, etc. Then,queries like “What fraction of subjects are male, smokers and above50?”, or “What proportion of subjects are female nonsmokers?” arecounting queries.
We will use an accuracy bound for MWEM [14]. For a datauniverse X , set of queries C and number of records N, the e-private MWEM answers all queries in C to within additive error Twith probability at least 1�b , where
T =
0
@128ln |X | ln
⇣32|C | ln |X |
bT 2
⌘
eN
1
A1/3
.
To fit this into our model, we define the accuracy functionA(e,N) to be the probability of exceeding error T on any query,i.e. b above. Solving, we can set
A(e,N) := b =32|C | ln |X |
T 2 exp✓� eNT 3
128ln |X |
◆,
fixing the accuracy constraint A(e,N) a . The budget constraintis (ee �1)EN B, like the previous example.
For the various parameters, suppose we want X = {0,1}5 so|X | = 25 and accuracy T = 0.2 for 20% error. Further, we wantto answer 1000 queries, so |C | = 1000. For the budget, supposethe individuals remain worried about their health insurance premi-ums, so E = 254.8. If the budget B > 2.9⇥108, the constraints aresatisfiable—take e = 0.2, N = 5⇥ 106, when each participant ispaid (ee �1)E = 56.4. In Section 8, we will see a different versionof MWEM with better costs.
![Page 43: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/43.jpg)
Who is developing DP?
• Computer scientists from different research areas: algorithms, machine learning, programming languages, systems and networks, cryptography, databases,
• Statisticians and data analysists,
• Researchers in social science, medicine?
![Page 44: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/44.jpg)
Research projects
• Putting differential privacy to work University of Pennsylvania
• Privacy tools for sharing research dataHarvard University
• Enabling medical research with differential privacyUniversity of Illinois
![Page 45: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/45.jpg)
Who is interested in DP?
• Facebook ads platform,
• The california public utilities commission,
• US census “on the map”,
• Orange security lab,
• Few start-ups for third parties analysis,
• Several research projects.
![Page 46: School of Computing University of Dundee · A. Haeberlen Motivation: Protecting privacy 2 USENIX Security (August 12, 2011) 19144 02-15-1964 flue 19146 05-22-1955 brain tumor 34505](https://reader033.vdocument.in/reader033/viewer/2022042002/5e6e543ed1de9b6eff6143be/html5/thumbnails/46.jpg)
Thanks!