the optimal mechanism in differential privacy · any two neighboring datasets 1, 2,and all...

The Optimal Mechanism in Differential Privacy

Quan GengAdvisor: Prof. Pramod Viswanath

3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 1

Outline

•Background on differential privacy

•Problem formulation

•Main result on the optimal mechanism

• Extensions to other settings

•Conclusion and future work


•Vast amounts of personal information are collected

•Data analysis produces a lot of useful applications

Motivation


•How to release the data while protecting individual’s privacy?

Motivation


Big DataRelease data/info.

Analyst

Applications

Motivation: Netflix Prize

• (2006) Netflix hosted a contest to improve movie recommendation

•Released dataset• 100 million ratings

• 500 thousand users

• 18 thousand movies


Image Credit: Arvind Narayanan

Motivation: Netflix Prize

• (2006) Netflix hosted a contest to improve movie recommendation

•Released dataset• 100 million ratings

• 500 thousand users

• 18 thousand movies

•Anonymization to protect customer privacy



Motivation: Netflix Prize Dataset Release



[Narayanan, Shmatikov 2008]

Motivation


• How to protect PRIVACY resilient to attacks with arbitrary side information?

Motivation



• Ans: randomized releasing mechanism

Motivation




•How much randomness is needed?• completely random (no utility)

• deterministic (no privacy)

Motivation




•How much randomness is needed?• completely random (no utility)

• deterministic (no privacy)

• Differential Privacy [Dwork et. al. 06]: one way to quantify the level of randomness and privacy

Background on Differential Privacy


database

Collect data

Users

Database Curator

Analyst

Query

Released info.

• Database Curator: How to answer a query, providing usefuldata information to the analyst, while still protecting the privacy of each user.



database

Collect data

Users

Database Curator

Analyst

Query

Released info.

dataset: D query function: 𝑞 (D) randomized released mechanism: K DAge

A: 20

B: 35

C: 43

D: 30

q: How many people are older than 32?

q(D) = 2

K(D) = 1 w.p. 1/8K(D) = 2 w.p. 1/2K(D) = 3 w.p. 1/4K(D) = 4 w.p. 1/8



• Two neighboring datasets: differ only at one element

Age

A: 20

B: 35

C: 43

D: 30

Age

A: 20

B: 35

D: 30

𝐷1 𝐷2



• Two neighboring datasets: differ only at one element

Age

A: 20

B: 35

C: 43

D: 30

Age

A: 20

B: 35

D: 30

𝐷1 𝐷2

Differential Privacy: presence or absence of any individual record in the dataset should not affect the released information significantly.

𝐾 𝐷1 ≈ 𝐾(𝐷2)



• A randomized mechanism 𝐾 gives 𝝐-differential privacy, if for any two neighboring datasets 𝐷1, 𝐷2 , and all 𝑆 ⊂Range(K),

𝐏𝐫 𝐊 𝐃𝟏 ∈ 𝐒 ≤ 𝐞𝛜 𝐏𝐫 𝐊 𝐃𝟐 ∈ 𝐒

(make hypothesis testing hard)

• 𝝐 quantifies the level of privacy• 𝜖 → 0, high privacy• 𝜖 → +∞, low privacy• a social question to choose 𝝐 (can be 0.01, 0.1, 1, 10 …)



• Q: How much perturbation needed to achieve 𝜖-DP?

• A: depends on how different 𝑞 𝐷1 , 𝑞(𝐷2) are





𝑞 𝐷1 , 𝑞(𝐷2)

If 𝑞 𝐷1 = 𝑞 𝐷2 , no perturbation is needed.





𝑓𝐷1(𝑥) 𝑓𝐷2(𝑥)

𝑞 𝐷1 𝑞(𝐷2)

If |𝑞 𝐷1 − 𝑞 𝐷2 | is small, use small perturbation





𝑓𝐷1(𝑥)𝑓𝐷2(𝑥)

𝑞 𝐷1 𝑞(𝐷2)

If |𝑞 𝐷1 − 𝑞 𝐷2 | is large, use large perturbation



• Global Sensitivity Δ: how different when q is applied to neighboring datasets

𝜟 ≔ 𝐦𝐚𝐱(𝑫𝟏,𝑫𝟐) 𝒂𝒓𝒆 𝒏𝒆𝒊𝒈𝒉𝒃𝒐𝒓𝒔

|𝒒 𝑫𝟏 − 𝒒 𝑫𝟐 |

• Example: for a count query, Δ = 1



• Laplace Mechanism:

𝑲 𝑫 = 𝒒 𝑫 + 𝑳𝒂𝒑(Δ

𝜖),

𝐿𝑎𝑝𝛥

𝜖is a r.v. with p.d.f 𝑓 𝑥 =

1

2𝜆𝑒−

|𝑥|

𝜆 , 𝜆 =Δ

𝜖

• Basic tool in DP

𝑓𝐷1(𝑥) 𝑓𝐷2(𝑥)

Optimality of Existing work?




𝜖),

• Two Questions:• Is data-independent perturbation optimal?

• Assume data-independent perturbation, is Laplacian distribution optimal?

Optimality of Existing work?




𝜖),

• Two questions:• Is data-independent perturbation optimal?• Assume data-independent perturbation, is Laplacian distribution

optimal?

• Our results:• data-independent perturbation is optimal• Laplacian distribution is not optimal: the optimal is staircase

distribution

Problem Formulation: DP constraint


•A generic randomized mechanism 𝐾 is a family of noise probability measures (t is query output)

𝑲 = {𝝂𝒕 ∶ 𝒕 ∈ 𝑹 }

•DP constraint: ∀𝑡1, 𝑡2 ∈ 𝑅, 𝑠. 𝑡. 𝑡1 − 𝑡2 ≤ Δ,𝛎𝒕𝟏 𝑺 ≤ 𝒆𝝐𝛎𝒕𝟐 𝐒 + 𝒕𝟏 − 𝒕𝟐 , ∀measurable set S

Problem Formulation: utility (cost) model


• Cost function on the noise𝐿: 𝑅 → 𝑅

• Given query output 𝑡, Cost(𝜈𝑡) = ∫ 𝐿 𝑥 𝜈𝑡 𝑑𝑥

• Example• 𝐿(𝑥) = |𝑥|, noise magnitude• 𝐿 𝑥 = 𝑥 2, noise power

• Objective(minmax): minimize sup

t∈𝑅∫𝑥∈𝑅 𝐿 𝑥 𝜈𝑡(𝑑𝑥)

Problem Formulation: put things all together


𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 supt∈𝑅

𝑥∈𝑅

𝐿 𝑥 𝜈𝑡(𝑑𝑥)

s.t. 𝛎t1 S ≤ eϵ𝛎t2 S + t1 − t2 ,

∀measurable set S, ∀t1, t2 ∈ R, s. t. t1 − t2 ≤ Δ,

Main Result: 𝜈𝑡 = 𝜈


In the optimal mechanism, 𝜈𝑡 is independent of 𝑡 (under a technical condition).



In the optimal mechanism, 𝜈𝑡 is independent of 𝑡 (under a technical condition).

𝜈𝑡(𝑆)

𝑡𝑇 2𝑇-𝑇 𝑇

𝑛

2𝑇

𝑛



𝜈𝑡(𝑆)

𝑡𝑇 2𝑇-𝑇 𝑇

𝑛2𝑇

𝑛

In the optimal mechanism among ∪𝑇>0∪𝑛≥1 𝐾𝑇,𝑛,

𝜈𝑡 is independent of 𝑡

Optimal noise probability distribution


𝑲 𝑫 = 𝒒 𝑫 + 𝑿,

Minimize ∫ L x P dxs.t. Pr X ∈ S ≤ eϵ Pr X ∈ S + d ,

∀ d ≤ Δ,measurable set S

Optimal noise probability distribution


𝑲 𝑫 = 𝒒 𝑫 + 𝑿,

• 𝐿 𝑥 satisfies:• symmetric, and increasing of |x|

• sup𝐿 𝑇+1

𝐿 𝑇< +∞

Minimize ∫ L x P dxs.t. Pr X ∈ S ≤ eϵ Pr X ∈ S + d ,

∀ d ≤ Δ,measurable set S

Main Result


The optimal probability distribution has

staircase-shaped p.d.f.

Main Result


Main Result


R.V. generation• Sign: binary r.v.• Interval: geometric r.v.• Segment: binary r.v.• Uniform r.v.

Proof Ideas


• Key idea:

• Divide each interval [𝑘Δ,(𝑘+1)Δ) into 𝑖 bins

• Approximate using piecewise constant p.d.f.

𝑎1

𝑎2𝑎3

𝑎𝑖 =𝑃(𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙)

𝑙𝑒𝑛𝑔𝑡ℎ(𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙)

Proof Ideas


• p.d.f. should be decreasing

𝑎1

𝑎2 𝑎3

𝑎1

𝑎2 𝑎3

Proof Ideas


• p.d.f. should be geometrically decreasing• Divide each interval 𝑘Δ, 𝑘 + 1 Δ into 𝑖 bins

•𝒂𝒌

𝒂𝒌+𝒊= 𝒆𝝐

𝑎3𝑎1 𝑎2

Δ

𝑎4

2Δ

𝒂𝟏𝒂𝟑= 𝒆𝝐,

𝒂𝟐𝒂𝟒= 𝒆𝝐

Proof Ideas


• The shape of p.d.f. in the first interval 0, 𝛥 should be a step function

𝑎3𝑎1 𝑎2

𝑎4

Δ

𝑎3𝑎1 𝑎2

𝑎4

Δ

Increase 𝒂𝟐, decrease 𝒂𝟑

Application: L(x) = |x|


𝜸∗ =𝟏

𝟏+𝒆𝝐𝟐

, and the minimum expectation of noise amplitude is

𝑽 𝑷𝜸∗ = 𝚫𝒆𝝐𝟐

𝒆𝝐−𝟏

• 𝜖 → 0, the additive gap → 0

• 𝜖 → +∞, 𝑉 𝑃𝛾∗ = Θ(Δe−𝜖

2)

𝑉𝐿𝑎𝑝 =Δ

𝜖

Numeric Comparison with Laplacian Mechanism

3/29/2013 41

0 2 4 6 8 100

5

10

15

VLap /

VO

ptim

al


3/29/2013 42

0 2 4 6 8 100

5

10

15

VLap /

VO

ptim

al

10 12 14 16 18 200

500

1000

1500

VLap /

VO

ptim

al

Application: L x = x2


• (𝑏:= 𝑒−𝜖)

Minimum noise power

Application: L x = x2


• (𝑏:= 𝑒−𝜖) The minimum noise power is

• 𝜖 → 0, the additive gap ≤ 𝑐Δ2

• 𝜖 → +∞, 𝑉 𝑃𝛾∗ = Θ(Δ2 e−

2𝜖

3 )

𝑉𝐿𝑎𝑝 =2Δ2

𝜖2


3/29/2013 45

0 2 4 6 8 100

5

10

15

20

25

VLap /

VO

ptim

al


3/29/2013 46

0 2 4 6 8 100

5

10

15

20

25

VLap /

VO

ptim

al

10 12 14 16 18 200

1000

2000

3000

4000

5000

VLap /

VO

ptim

al

Properties of 𝜸∗


• 𝐿 𝑥 = 𝑥 𝑚,

Also holds for cost functions which are positive linear combinations of momentum functions.

Extension to Discrete Setting


• Query function: q(D) is integer-valued




Adding data-independent noise is optimal

Extension to Abstract Setting


• 𝐾:𝑫 → 𝑹 (can be arbitrary), with base measure 𝝁.

• Cost function: 𝑪:𝑫 × 𝑹 → [𝟎,+∞]

• Sensitivity

Δ ≔ max𝑟∈𝑅,𝐷1,𝐷2: 𝐷1−𝐷2 ≤1

|𝐶 𝐷1, 𝑟 − 𝐶(𝐷2, 𝑟)|





• Sensitivity

Δ ≔ max𝑟∈𝑅,𝐷1,𝐷2: 𝐷1−𝐷2 ≤1

|𝐶 𝐷1, 𝑟 − 𝐶(𝐷2, 𝑟)|

• Staircase mechanism:

Random sampling over the output range with staircase distribution.





• Sensitivity

Δ ≔ max𝑟∈𝑅,𝐷1,𝐷2: 𝐷1−𝐷2 ≤1

|𝐶 𝐷1, 𝑟 − 𝐶(𝐷2, 𝑟)|

• Staircase mechanism:Random sampling over the output range with staircase distribution.

• If R is real space, and 𝐶 𝑑, 𝑟 = 𝑟 − 𝑞 𝑑 , regular staircase mechanism

Conclusion


• Fundamental tradeoff between privacy and utility in DP

• Staircase Mechanism, optimal mechanism for single real-valued query• Huge improvement in low privacy regime

• Extension to discrete setting and abstract setting

Future Work: Multidimensional setting:


• Query output can have multiple components

𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫




𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫

• If all components are uncorrelated, perturb each component independently to preserve DP.




𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫

• For some query, all components are coupled together.

Histogram query: one user can affect only one component




𝒒 𝑫 = 𝒒𝟏 𝑫 , 𝒒𝟐 𝑫 ,… , 𝒒𝒏 𝑫

• For some query, all components are coupled together.

• Global sensitivity

𝚫 ≔ max𝑫𝟏,𝑫𝟐 𝒂𝒓𝒆 𝒏𝒆𝒊𝒈𝒉𝒃𝒐𝒓𝒔

|𝒒 𝑫𝟏 − 𝒒 𝑫𝟐 | 𝟏

Histogram query: 𝚫 = 1



• Conjecture: for histogram-like query, optimal noise probability distribution is multidimensional staircase-shaped

Future Work: (𝜖, 𝛿)-differential privacy


• (𝜖, 𝛿)-differential privacy

𝐏𝐫 𝐊 𝐃𝟏 ∈ 𝐒 ≤ 𝐞𝛜 𝐏𝐫 𝐊 𝐃𝟐 ∈ 𝐒 + 𝛅

• The standard approach is to add Gaussian noise.

• What is the optimal noise probability distribution in this setting?

Thank you!


the optimal mechanism in differential privacy · any two neighboring datasets 1, 2,and all...

Documents