the optimal mechanism in differential privacy · any two neighboring datasets 1, 2,and all...

61
The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 1

Upload: others

Post on 17-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

The Optimal Mechanism in Differential Privacy

Quan GengAdvisor: Prof. Pramod Viswanath

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 1

Page 2: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Outline

•Background on differential privacy

•Problem formulation

•Main result on the optimal mechanism

• Extensions to other settings

•Conclusion and future work

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 2

Page 3: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

•Vast amounts of personal information are collected

•Data analysis produces a lot of useful applications

Motivation

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 3

Page 4: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

•How to release the data while protecting individual’s privacy?

Motivation

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 4

Big DataRelease data/info.

Analyst

Applications

Page 5: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Motivation: Netflix Prize

• (2006) Netflix hosted a contest to improve movie recommendation

•Released dataset• 100 million ratings

• 500 thousand users

• 18 thousand movies

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 5

Image Credit: Arvind Narayanan

Page 6: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Motivation: Netflix Prize

• (2006) Netflix hosted a contest to improve movie recommendation

•Released dataset• 100 million ratings

• 500 thousand users

• 18 thousand movies

•Anonymization to protect customer privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 6

Image Credit: Arvind Narayanan

Page 7: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Motivation: Netflix Prize Dataset Release

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 7

Image Credit: Arvind Narayanan

[Narayanan, Shmatikov 2008]

Page 8: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Motivation

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 8

• How to protect PRIVACY resilient to attacks with arbitrary side information?

Page 9: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Motivation

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 9

• How to protect PRIVACY resilient to attacks with arbitrary side information?

• Ans: randomized releasing mechanism

Page 10: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Motivation

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 10

• How to protect PRIVACY resilient to attacks with arbitrary side information?

• Ans: randomized releasing mechanism

•How much randomness is needed?• completely random (no utility)

• deterministic (no privacy)

Page 11: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Motivation

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 11

• How to protect PRIVACY resilient to attacks with arbitrary side information?

• Ans: randomized releasing mechanism

•How much randomness is needed?• completely random (no utility)

• deterministic (no privacy)

• Differential Privacy [Dwork et. al. 06]: one way to quantify the level of randomness and privacy

Page 12: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 12

database

Collect data

Users

Database Curator

Analyst

Query

Released info.

• Database Curator: How to answer a query, providing usefuldata information to the analyst, while still protecting the privacy of each user.

Page 13: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 13

database

Collect data

Users

Database Curator

Analyst

Query

Released info.

dataset: D query function: 𝑞 (D) randomized released mechanism: K DAge

A: 20

B: 35

C: 43

D: 30

q: How many people are older than 32?

q(D) = 2

K(D) = 1 w.p. 1/8K(D) = 2 w.p. 1/2K(D) = 3 w.p. 1/4K(D) = 4 w.p. 1/8

Page 14: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 14

• Two neighboring datasets: differ only at one element

Age

A: 20

B: 35

C: 43

D: 30

Age

A: 20

B: 35

D: 30

𝐷1 𝐷2

Page 15: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 15

• Two neighboring datasets: differ only at one element

Age

A: 20

B: 35

C: 43

D: 30

Age

A: 20

B: 35

D: 30

𝐷1 𝐷2

Differential Privacy: presence or absence of any individual record in the dataset should not affect the released information significantly.

𝐾 𝐷1 ≈ 𝐾(𝐷2)

Page 16: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 16

• A randomized mechanism 𝐾 gives 𝝐-differential privacy, if for any two neighboring datasets 𝐷1, 𝐷2 , and all 𝑆 ⊂Range(K),

𝐏𝐫 𝐊 𝐃𝟏 ∈ 𝐒 ≤ 𝐞𝛜 𝐏𝐫 𝐊 𝐃𝟐 ∈ 𝐒

(make hypothesis testing hard)

• 𝝐 quantifies the level of privacy• 𝜖 → 0, high privacy• 𝜖 → +∞, low privacy• a social question to choose 𝝐 (can be 0.01, 0.1, 1, 10 …)

Page 17: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 17

• Q: How much perturbation needed to achieve 𝜖-DP?

• A: depends on how different 𝑞 𝐷1 , 𝑞(𝐷2) are

Page 18: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 18

• Q: How much perturbation needed to achieve 𝜖-DP?

• A: depends on how different 𝑞 𝐷1 , 𝑞(𝐷2) are

𝑞 𝐷1 , 𝑞(𝐷2)

If 𝑞 𝐷1 = 𝑞 𝐷2 , no perturbation is needed.

Page 19: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 19

• Q: How much perturbation needed to achieve 𝜖-DP?

• A: depends on how different 𝑞 𝐷1 , 𝑞(𝐷2) are

𝑓𝐷1(𝑥) 𝑓𝐷2(𝑥)

𝑞 𝐷1 𝑞(𝐷2)

If |𝑞 𝐷1 − 𝑞 𝐷2 | is small, use small perturbation

Page 20: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 20

• Q: How much perturbation needed to achieve 𝜖-DP?

• A: depends on how different 𝑞 𝐷1 , 𝑞(𝐷2) are

𝑓𝐷1(𝑥)𝑓𝐷2(𝑥)

𝑞 𝐷1 𝑞(𝐷2)

If |𝑞 𝐷1 − 𝑞 𝐷2 | is large, use large perturbation

Page 21: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 21

• Global Sensitivity Δ: how different when q is applied to neighboring datasets

𝜟 ≔ 𝐦𝐚𝐱(𝑫𝟏,𝑫𝟐) 𝒂𝒓𝒆 𝒏𝒆𝒊𝒈𝒉𝒃𝒐𝒓𝒔

|𝒒 𝑫𝟏 − 𝒒 𝑫𝟐 |

• Example: for a count query, Δ = 1

Page 22: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Background on Differential Privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 22

• Laplace Mechanism:

𝑲 𝑫 = 𝒒 𝑫 + 𝑳𝒂𝒑(Δ

𝜖),

𝐿𝑎𝑝𝛥

𝜖is a r.v. with p.d.f 𝑓 𝑥 =

1

2𝜆𝑒−

|𝑥|

𝜆 , 𝜆 =Δ

𝜖

• Basic tool in DP

𝑓𝐷1(𝑥) 𝑓𝐷2(𝑥)

Page 23: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Optimality of Existing work?

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 23

• Laplace Mechanism:

𝑲 𝑫 = 𝒒 𝑫 + 𝑳𝒂𝒑(Δ

𝜖),

• Two Questions:• Is data-independent perturbation optimal?

• Assume data-independent perturbation, is Laplacian distribution optimal?

Page 24: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Optimality of Existing work?

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 24

• Laplace Mechanism:

𝑲 𝑫 = 𝒒 𝑫 + 𝑳𝒂𝒑(Δ

𝜖),

• Two questions:• Is data-independent perturbation optimal?• Assume data-independent perturbation, is Laplacian distribution

optimal?

• Our results:• data-independent perturbation is optimal• Laplacian distribution is not optimal: the optimal is staircase

distribution

Page 25: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Problem Formulation: DP constraint

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 25

•A generic randomized mechanism 𝐾 is a family of noise probability measures (t is query output)

𝑲 = {𝝂𝒕 ∶ 𝒕 ∈ 𝑹 }

•DP constraint: ∀𝑡1, 𝑡2 ∈ 𝑅, 𝑠. 𝑡. 𝑡1 − 𝑡2 ≤ Δ,𝛎𝒕𝟏 𝑺 ≤ 𝒆𝝐𝛎𝒕𝟐 𝐒 + 𝒕𝟏 − 𝒕𝟐 , ∀measurable set S

Page 26: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Problem Formulation: utility (cost) model

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 26

• Cost function on the noise𝐿: 𝑅 → 𝑅

• Given query output 𝑡, Cost(𝜈𝑡) = ∫ 𝐿 𝑥 𝜈𝑡 𝑑𝑥

• Example• 𝐿(𝑥) = |𝑥|, noise magnitude• 𝐿 𝑥 = 𝑥 2, noise power

• Objective(minmax): minimize sup

t∈𝑅∫𝑥∈𝑅 𝐿 𝑥 𝜈𝑡(𝑑𝑥)

Page 27: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Problem Formulation: put things all together

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 27

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 supt∈𝑅

𝑥∈𝑅

𝐿 𝑥 𝜈𝑡(𝑑𝑥)

s.t. 𝛎t1 S ≤ eϵ𝛎t2 S + t1 − t2 ,

∀measurable set S, ∀t1, t2 ∈ R, s. t. t1 − t2 ≤ Δ,

Page 28: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Main Result: 𝜈𝑡 = 𝜈

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 28

In the optimal mechanism, 𝜈𝑡 is independent of 𝑡 (under a technical condition).

Page 29: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Main Result: 𝜈𝑡 = 𝜈

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 29

In the optimal mechanism, 𝜈𝑡 is independent of 𝑡 (under a technical condition).

𝜈𝑡(𝑆)

𝑡𝑇 2𝑇-𝑇 𝑇

𝑛

2𝑇

𝑛

Page 30: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Main Result: 𝜈𝑡 = 𝜈

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 30

𝜈𝑡(𝑆)

𝑡𝑇 2𝑇-𝑇 𝑇

𝑛2𝑇

𝑛

In the optimal mechanism among ∪𝑇>0∪𝑛≥1 𝐾𝑇,𝑛,

𝜈𝑡 is independent of 𝑡

Page 31: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Optimal noise probability distribution

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 31

𝑲 𝑫 = 𝒒 𝑫 + 𝑿,

Minimize ∫ L x P dxs.t. Pr X ∈ S ≤ eϵ Pr X ∈ S + d ,

∀ d ≤ Δ,measurable set S

Page 32: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Optimal noise probability distribution

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 32

𝑲 𝑫 = 𝒒 𝑫 + 𝑿,

• 𝐿 𝑥 satisfies:• symmetric, and increasing of |x|

• sup𝐿 𝑇+1

𝐿 𝑇< +∞

Minimize ∫ L x P dxs.t. Pr X ∈ S ≤ eϵ Pr X ∈ S + d ,

∀ d ≤ Δ,measurable set S

Page 33: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Main Result

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 33

The optimal probability distribution has

staircase-shaped p.d.f.

Page 34: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Main Result

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 34

Page 35: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Main Result

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 35

R.V. generation• Sign: binary r.v.• Interval: geometric r.v.• Segment: binary r.v.• Uniform r.v.

Page 36: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Proof Ideas

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 36

• Key idea:

• Divide each interval [𝑘Δ,(𝑘+1)Δ) into 𝑖 bins

• Approximate using piecewise constant p.d.f.

𝑎1

𝑎2𝑎3

𝑎𝑖 =𝑃(𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙)

𝑙𝑒𝑛𝑔𝑡ℎ(𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙)

Page 37: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Proof Ideas

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 37

• p.d.f. should be decreasing

𝑎1

𝑎2 𝑎3

𝑎1

𝑎2 𝑎3

Page 38: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Proof Ideas

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 38

• p.d.f. should be geometrically decreasing• Divide each interval 𝑘Δ, 𝑘 + 1 Δ into 𝑖 bins

•𝒂𝒌

𝒂𝒌+𝒊= 𝒆𝝐

𝑎3𝑎1 𝑎2

Δ

𝑎4

𝒂𝟏𝒂𝟑= 𝒆𝝐,

𝒂𝟐𝒂𝟒= 𝒆𝝐

Page 39: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Proof Ideas

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 39

• The shape of p.d.f. in the first interval 0, 𝛥 should be a step function

𝑎3𝑎1 𝑎2

𝑎4

Δ

𝑎3𝑎1 𝑎2

𝑎4

Δ

Increase 𝒂𝟐, decrease 𝒂𝟑

Page 40: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Application: L(x) = |x|

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 40

𝜸∗ =𝟏

𝟏+𝒆𝝐𝟐

, and the minimum expectation of noise amplitude is

𝑽 𝑷𝜸∗ = 𝚫𝒆𝝐𝟐

𝒆𝝐−𝟏

• 𝜖 → 0, the additive gap → 0

• 𝜖 → +∞, 𝑉 𝑃𝛾∗ = Θ(Δe−𝜖

2)

𝑉𝐿𝑎𝑝 =Δ

𝜖

Page 41: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Numeric Comparison with Laplacian Mechanism

3/29/2013 41

0 2 4 6 8 100

5

10

15

VLap /

VO

ptim

al

Page 42: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Numeric Comparison with Laplacian Mechanism

3/29/2013 42

0 2 4 6 8 100

5

10

15

VLap /

VO

ptim

al

10 12 14 16 18 200

500

1000

1500

VLap /

VO

ptim

al

Page 43: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Application: L x = x2

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 43

• (𝑏:= 𝑒−𝜖)

Minimum noise power

Page 44: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Application: L x = x2

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 44

• (𝑏:= 𝑒−𝜖) The minimum noise power is

• 𝜖 → 0, the additive gap ≤ 𝑐Δ2

• 𝜖 → +∞, 𝑉 𝑃𝛾∗ = Θ(Δ2 e−

2𝜖

3 )

𝑉𝐿𝑎𝑝 =2Δ2

𝜖2

Page 45: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Numeric Comparison with Laplacian Mechanism

3/29/2013 45

0 2 4 6 8 100

5

10

15

20

25

VLap /

VO

ptim

al

Page 46: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Numeric Comparison with Laplacian Mechanism

3/29/2013 46

0 2 4 6 8 100

5

10

15

20

25

VLap /

VO

ptim

al

10 12 14 16 18 200

1000

2000

3000

4000

5000

VLap /

VO

ptim

al

Page 47: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Properties of 𝜸∗

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 47

• 𝐿 𝑥 = 𝑥 𝑚,

Also holds for cost functions which are positive linear combinations of momentum functions.

Page 48: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Extension to Discrete Setting

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 48

• Query function: q(D) is integer-valued

Page 49: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Extension to Discrete Setting

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 49

• Query function: q(D) is integer-valued

Adding data-independent noise is optimal

Page 50: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Extension to Discrete Setting

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 50

• Query function: q(D) is integer-valued

Adding data-independent noise is optimal

Page 51: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Extension to Abstract Setting

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 51

• 𝐾:𝑫 → 𝑹 (can be arbitrary), with base measure 𝝁.

• Cost function: 𝑪:𝑫 × 𝑹 → [𝟎,+∞]

• Sensitivity

Δ ≔ max𝑟∈𝑅,𝐷1,𝐷2: 𝐷1−𝐷2 ≤1

|𝐶 𝐷1, 𝑟 − 𝐶(𝐷2, 𝑟)|

Page 52: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Extension to Abstract Setting

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 52

• 𝐾:𝑫 → 𝑹 (can be arbitrary), with base measure 𝝁.

• Cost function: 𝑪:𝑫 × 𝑹 → [𝟎,+∞]

• Sensitivity

Δ ≔ max𝑟∈𝑅,𝐷1,𝐷2: 𝐷1−𝐷2 ≤1

|𝐶 𝐷1, 𝑟 − 𝐶(𝐷2, 𝑟)|

• Staircase mechanism:

Random sampling over the output range with staircase distribution.

Page 53: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Extension to Abstract Setting

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 53

• 𝐾:𝑫 → 𝑹 (can be arbitrary), with base measure 𝝁.

• Cost function: 𝑪:𝑫 × 𝑹 → [𝟎,+∞]

• Sensitivity

Δ ≔ max𝑟∈𝑅,𝐷1,𝐷2: 𝐷1−𝐷2 ≤1

|𝐶 𝐷1, 𝑟 − 𝐶(𝐷2, 𝑟)|

• Staircase mechanism:Random sampling over the output range with staircase distribution.

• If R is real space, and 𝐶 𝑑, 𝑟 = 𝑟 − 𝑞 𝑑 , regular staircase mechanism

Page 54: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Conclusion

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 54

• Fundamental tradeoff between privacy and utility in DP

• Staircase Mechanism, optimal mechanism for single real-valued query• Huge improvement in low privacy regime

• Extension to discrete setting and abstract setting

Page 55: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Future Work: Multidimensional setting:

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 55

• Query output can have multiple components

𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫

Page 56: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Future Work: Multidimensional setting:

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 56

• Query output can have multiple components

𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫

• If all components are uncorrelated, perturb each component independently to preserve DP.

Page 57: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Future Work: Multidimensional setting:

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 57

• Query output can have multiple components

𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫

• For some query, all components are coupled together.

Histogram query: one user can affect only one component

Page 58: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Future Work: Multidimensional setting:

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 58

• Query output can have multiple components

𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫

• For some query, all components are coupled together.

• Global sensitivity

𝚫 ≔ max𝑫𝟏,𝑫𝟐 𝒂𝒓𝒆 𝒏𝒆𝒊𝒈𝒉𝒃𝒐𝒓𝒔

|𝒒 𝑫𝟏 − 𝒒 𝑫𝟐 | 𝟏

Histogram query: 𝚫 = 1

Page 59: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Future Work: Multidimensional setting:

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 59

• Conjecture: for histogram-like query, optimal noise probability distribution is multidimensional staircase-shaped

Page 60: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Future Work: (𝜖, 𝛿)-differential privacy

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 60

• (𝜖, 𝛿)-differential privacy

𝐏𝐫 𝐊 𝐃𝟏 ∈ 𝐒 ≤ 𝐞𝛜 𝐏𝐫 𝐊 𝐃𝟐 ∈ 𝐒 + 𝛅

• The standard approach is to add Gaussian noise.

• What is the optimal noise probability distribution in this setting?

Page 61: The Optimal Mechanism in Differential Privacy · any two neighboring datasets 1, 2,and all ⊂Range(K), 𝐏𝐫𝐊𝐃 ∈𝐒≤𝐞𝛜𝐏𝐫𝐊𝐃 ∈𝐒 (makehypothesistestinghard)

Thank you!

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 61