differential privacy and bayesian learningapproximateinference.org/2017/schedule/honkela2017.pdf ·...

20
Differential privacy and Bayesian learning Antti Honkela Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics & Department of Public Health, University of Helsinki Workshop on Advances in Approximate Bayesian Inference NIPS 2017, 8 December 2017

Upload: others

Post on 21-Nov-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Differential privacy and Bayesian learning

Antti Honkela

Helsinki Institute for Information Technology HIIT,Department of Mathematics and Statistics &

Department of Public Health,University of Helsinki

Workshop on Advances in Approximate Bayesian InferenceNIPS 2017, 8 December 2017

Page 2: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

The need for privacy: theory

No one shall be subjected to arbitrary interference with his privacy, family,home or correspondence, nor to attacks upon his honour and reputation.Everyone has the right to the protection of the law against such interferenceor attacks.

—Universal Declaration of Human Rights, Article 12

Page 3: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

The need for privacy: practice

I Lack of data is a major impediment to development of personalised medicine, butdata cannot be shared without strong privacy

I The European General Data Protection Regulation (GDPR) coming to force inMay 2018 sets strict conditions for use of private information with huge fines forbreaches

I ML models retain a lot of private information, vulnerable to model inversionattacks

I Simple methods without strong theoretical guarantees cannot be relied upon inthe long term

Page 4: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

ε-differential privacy (DP; Dwork et al., 2006)

Definition

An algorithm M operating on a data set D is said to be ε-differentially private(ε-DP) if for any two data sets D and D′, differing only by one sample, theprobabilities of obtaining any result S fulfil

Pr(M(D) ∈ S)

Pr(M(D′) ∈ S)≤ eε.

Page 5: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

(ε, δ)-differential privacy

Definition

An algorithm M operating on a data set D is said to be (ε, δ)-differentially private((ε, δ)-DP) if for any two data sets D and D′, differing only by one sample, theprobabilities of obtaining any result S fulfil

Pr(M(D) ∈ S) ≤ eεPr(M(D′) ∈ S) + δ.

Page 6: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

DP release of a function value f (D)

1. Evaluate the sensitivity of f

∆pf = max‖D−D′‖=1

‖f (D)− f (D′)‖p.

2. Compute

M(D) = f (D) +∆pf

εη

with

I Laplace mechanism: Pure ε-DP with p = 1 and

η ∼ Laplace(0, 1)

I Gaussian mechanism: (ε, δ)-DP with p = 2 and

η ∼ N (0, 2 ln(1.25/δ))

Page 7: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

DP choice among alternatives

1. Evaluate the sensitivity of the utility u(D, r) over a choice between r ∈ R

∆u = maxr∈(R)

max‖D−D′‖=1

|u(D, r)− u(D′, r)|.

2. Exponential mechanism: choose r with probability proportional to

p(r) ∝ exp

(εu(x , r)

2∆u

).

This is ε-DP (McSherry & Talwar, FOCS 2007).

Page 8: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

DP Bayesian inference

(Approximate) Bayesian inference

Data

Privacy wall

θ1 θ2

Y1 Y2

X1 X2

Probabilisticmodel

Privacy-awareposterior approximation

Page 9: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

DP mechanisms for Bayesian inference

Three approaches to DP Bayesian inference:

1. Drawing single samples from the posterior with the exponential mechanism(Dimitrakakis et al., ALT 2014; Wang et al., ICML 2015; Geumlek et al., NIPS2017)

2. Sufficient statistic perturbation (SSP) for exponential family models withLaplace/Gaussian mechanism(e.g. Foulds et al., UAI 2016; Honkela et al., 2016, Park et al., 2016)

3. Perturbation of gradients in SG-MCMC (Wang et al., ICML 2015) or VI (Jalko etal., UAI 2017) with Laplace/Gaussian mechanism

Page 10: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Approach 1: Posterior sampling mechanisms

I Elegant and seemingly broadly applicable, but

I ... requires bounded or Lipschitz likelihood (proofs can be challenging)

I ... one sample tells very little, cost of additional samples linear in the number ofsamples

I ... privacy guarantee is conditional on exact sampling, which is infeasible for mostmodels

Page 11: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Approach 2: Sufficient statistic perturbation (SSP)

For exponential family models, all information about the data D = {x1, . . . , xn} iscontained in the sum of sufficient statistics

∑i S(xi ).

This suggests a differentially private mechanism where we apply e.g. the Laplacemechanism on the sum to obtain perturbed sufficient statistics

M(D) =∑i

S(xi ) + ξ,

with ξ ∼ Lap(0,∆1(S)/ε), and then proceed with the inference as usual (Foulds et al.,UAI 2016; Honkela et al., 2016).

Page 12: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Consistency and efficiency

I Consistency: SSP DP estimates of posterior mean parameters converge to thecorresponding non-private values as n→∞

θM =τ +M(D)

n + n0=τ +

∑i S(xi ) + ξ

n + n0

=τ +

∑i S(xi )

n + n0+

ξ

n + n0p→τ +

∑i S(xi )

n + n0= θNP .

I Convergence rate O(1/n) is optimal for any (ε, δ)-DP mechanism.

Page 13: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

SSP results on drug sensitivity prediction

10+0

10+100

10+200

10+300

10+400

10+500

10+600

10+700

10+800

Size of dataset (internal+external)

0.50

0.51

0.52

0.53

0.54

0.55

0.56

Wpc

-inde

x

LR non-private data (d=10)LR non-private data (d=64)Non-private linear regression (LR)Non-private lassoRobust private LRPrivate LROutput perturbed LRFunctional mechanism LR

Mrinal Das, Arttu Nieminen

Page 14: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Practical challenges in efficient DP Bayesian learning

I Latent variables cannot be shared

I Asymptotic efficiency is insufficient to guarantee practical efficiency

I High dimensional data needs more DP noise

I More aggressive dimensionality reduction than usual often needed

I Further: a single outlier can impose huge sensitivity on the data

I Need to inject a lot of noise in DP to mask itI The useful contribution such points have in learning is at best minimal

Page 15: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Decreasing the sensitivity by clipping

Clipping bound (data stdevs)0 0.5 1 1.5 2

Ran

k co

rrel

atio

n

0.1

0.15

0.2

0.25

0.3

0.35

n=1000, DPn=3000, DPn=1000, non-privaten=3000, non-private

Page 16: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Approach 3: DP gradient perturbation methods

Assume target L(θ,X ) =∑

i Li (θ, xi ) (posterior or ELBO)

1. Each gi (θ) = ∇θLi (θ, xi ) is clipped s.t. ||gi (θ)||2 ≤ ct in order to calculategradient sensitivity

2. Subsampling xi with frequency q in order to use the privacy amplification theorem

3. Gradient contributions from all data samples in the mini batch are summed andperturbed with Gaussian noise N (0, σ2I)

4. Total privacy cost can be computed from composition theorems or using themoments accountant (Abadi et al., CCS 2016)

Page 17: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

DPVI logistic regression results on UCI Adult

10 1 100 101 102

tot

0.55

0.60

0.65

0.70

0.75

0.80

Test

acc

urac

y

Adult data

Non DPDP-SGLDDPVIDPVI-MA

Joonas Jalko, Onur Dikmen

Page 18: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Open questions / research directions

I Practical methods for making inference aware of the injected DP noise

I Cf. Williams & McSherry (NIPS 2010)

I Posterior uncertainty calibrated with the injected DP noise

I Many current methods ignore this, running inference as if there was no extranoise

I Model engineering for DP: decreasing sensitivity

I Robust models should in theory be better for privacy, but need to make sureour methods can take advantage of this

Page 19: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Conclusion

I DP as a strong privacy framework

I Sufficient statistic perturbation asymptotically consistent and efficient forexponential family models

I For finite data: dimensionality reduction and clipping the data are essential forobtaining better performance

I DPVI and DP-SG-MCMC applicable to more general models

I DP can be combined with encryption to run securely on distributed data (e.g.Heikkila et al., NIPS 2017)

Page 20: Differential privacy and Bayesian learningapproximateinference.org/2017/schedule/Honkela2017.pdf · The need for privacy: practice I Lack of data is a major impediment to development

Acknowledgements

Mrinal DasOnur DikmenMikko HeikkilaEemil LagerspetzJoonas JalkoArttu NieminenSasu TarkomaKana ShimizuSamuel Kaski

Funding: Academy of Finland