boosting and differential privacy cynthia dwork, microsoft research texpoint fonts used in emf. read...

Boosting and Differential Privacy

Cynthia Dwork, Microsoft Research

The Power of Small, Private, Miracles

Joint work with Guy Rothblum and Salil Vadhan

Boosting [Schapire, 1989] General method for improving accuracy of any given learning

algorithm

Example: Learning to recognize spam e-mail “Base learner” receives labeled examples, outputs heuristic

Labels are {+1, -1} Run many times; combine the resulting heuristics

Base Learner

S: Labeled examples from D

A1, A2, …

Update D

A

Combine A1, A2, …

Does well on ½ + ´ of D

Terminate?

Base Learner


A1, A2, …

Update D

A

Combine A1, A2, …


How? Terminate?

Boosting for People [Variant of AdaBoost, FS95] Initial distribution D is uniform on database rows S is always a subset of k elements drawn from Dk

Combiner is majority Weight update:

If correctly classified by current A, decrease weight by factor of e “subtract 1 from exponent”

If incorrectly classified by current A, increase weight by factor of e “add 1 to exponent”

Re-normalize to obtain updated D

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

s Ns = (1/m) i exp (-s cs(i)) s Ns is shrinking exponentially (depends on ´)

Normalizers are sums of weights; At start of each round these sum to 1 “more” decrease (because the base learner is good) than increase More weight has the exponent shrink than otherwise

i exp (-s cs(i)) = i exp (- yis As(i)) This is an upper bound on # of incorrectly classified examples:

If yi ≠ sign[s As(i)] ( = majority{A1(i), A2(i),…}),

then yi s As(i) < 0, so exp(-yi s As(i)) ≥ 1.

Therefore, the number of incorrectly classified examples is exponentially small in t

-1/+1renormalize

Base Learner


A1, A2, …

Update D

A

Combine A1, A2, …majority

Initially:D uniform on DB rows


Privacy?

Terminate?

Private Boosting for People Base learner must be differentially private Main concern is rows whose weight grows too large

Affects termination test, sampling, re-normalizing Similar to problem arising when learning in the presence of noise Similar solution: smooth boosting

Remove (give up on) elements that become too heavy Carefully! Removing one heavy element and re-normalizing may

cause another element to become heavy… Ensure this is rare (else give up on too many elements; hurt accuracy)

Iterative Smoothing Not today.

Boosting for Queries? Goal: Given database DB and a set Q of low-sensitivity queries,

produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB).

Assume existence of (²0, ±0)-dp Base Learner producing an

object O that does well on more than half of D Pr q » D [ |q(O) – q(DB)| < ¸ ] > (1/2 + ´)

Base Learner


A1, A2, …

Update D

A

Combine A1, A2, …

Initially:D uniform on Q


-1/+1renormalize

Base Learner


A1, A2, …

Update D

A

Combine A1, A2, …median

Initially:D uniform on Q


Privacy?

Individual can affectmany queries at once!

Terminate?

Privacy is Problematic In smooth boosting for people, at each round an individual has

only a small effect on the probability distribution In boosting for queries, an individual can affect the quality of

q(At) simultaneously for many q As time progresses, distributions on neighboring databases could

evolve completely differently, yielding very different A t’s Slightly ameliorated by sampling (if only a few samples, maybe can

avoid the q’s on the edge?)

How can we make the re-weighting less sensitive?

Private Boosting for Queries [Variant of AdaBoost] Initial distribution D is uniform on queries in Q S is always a set of k elements drawn from Qk

Combiner is median [viz. Freund92] Weight update for queries

If very well approximated by At, decrease weight by factor of e (“-1”)

If very poorly approximated by At, increase weight by factor of e (“+1”) In between, scale with distance of midpoint (down or up):

2 ( |q(DB) – q(At)| - (¸ + ¹/2) ) / ¹ (sensitivity 2½/¹)

+

Theorem (minus some parameters) Let all q 2 Q have sensitivity · ½. Run the query-boost algorithm for T = log | Q |/´2 rounds

with

¹ = ((log | Q |/´2 )2 ½ √k ) / ²

The resulting object Q is ( (² + T²0), T±0) )-dp and, whp, gives (¸+¹)-accurate answers to all the queries in Q .

Better privacy (small ²) gives worse utility (larger ¹) Better base learner (smaller k, larger ´) helps

Proving Privacy Technique #1: Pay Your Debt and Move On

Fix A1, A2, …, At (record D vs D’ confidence gain) “Pay Your Debt” Focus on gain in selection of S 2 Q k in round t+1 “Move On”

Based on distributions Dt+1 and D’ t+1 determined in round t Will call them D, D’

Technique #2: Evolution of Confidence [DiDwN03] “Delay Payment Until Final Reckoning” Choose q1, q2, …, in turn

For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | Eq »D ln ( D[q ] / D’[q] )|

Prq1,…,qk [| i ln ( D[qi ] / D’[qi] )| > z√k (A + B) + k B] < exp(-z2/2)

AB

Bounding Eq »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0

· Eq » D ln[ D(q)/D’(q) ] · 2A2 (that is, B · 2A2).

KL(D||D’) = q ln[ D(q)/D’(q) ] D(q); always ¸ 0

So, KL(D||D’) · KL(D||D’) + KL(D’||D)= q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ]

· q 0 + |D’(q)-D(q)| A

= A q [ max (D(q),D’(q)) - min (D(q),D’(q)) ]

· A q eA min (D(q),D’(q)) - min (D(q),D’(q))

· A q (eA – 1) min (D(q),D’(q))

· 2A2 when A < 1

Compare DiDwN03

Motivation and Application Boosting for People

Logistic Regression for 3000+ dimensional data Slight twist on CM did pretty well (eps = 1.5) Thought about alternatives

Boosting for Queries Reducing the dependence on the concept class in the work on synthetic databases

in DNRRV09 (Salil’s talk) Over-interpreted the polytime DiNi style attacks (we were spoiled)

Can’t have cn queries with error o(√n) BLR08: can have cn queries with error O(n2/3) DNNRV09: O(n1/2 |Q |o(1)) Now: O(n1/2 log2 |Q |)

Result is more general Only know of base learner for counting queries

Base Learner


A1, A2, …

Update D

A

Combine A1, A2, …


Terminate?

boosting and differential privacy cynthia dwork, microsoft research texpoint fonts used in emf. read...

Documents

exp s csi s ns

exp s csis ns

exp s csi ati

expyi s asi

d1iexps csis ns dt

exp yis asi

dti expcti ntnt1n1dt

nt nt dt