boosting and differential privacy cynthia dwork, microsoft research texpoint fonts used in emf. read...

Post on 14-Jan-2016






Click to see full reader


Boosting and Differential Privacy

Cynthia Dwork, Microsoft Research

The Power of Small, Private, Miracles

Joint work with Guy Rothblum and Salil Vadhan

Boosting [Schapire, 1989] General method for improving accuracy of any given learning


Example: Learning to recognize spam e-mail “Base learner” receives labeled examples, outputs heuristic

Labels are {+1, -1} Run many times; combine the resulting heuristics

Base Learner

S: Labeled examples from D

A1, A2, …

Update D


Combine A1, A2, …

Does well on ½ + ´ of D


Base Learner

S: Labeled examples from D

A1, A2, …

Update D


Combine A1, A2, …

Does well on ½ + ´ of D

How? Terminate?

Boosting for People [Variant of AdaBoost, FS95] Initial distribution D is uniform on database rows S is always a subset of k elements drawn from Dk

Combiner is majority Weight update:

If correctly classified by current A, decrease weight by factor of e “subtract 1 from exponent”

If incorrectly classified by current A, increase weight by factor of e “add 1 to exponent”

Re-normalize to obtain updated D

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

s Ns = (1/m) i exp (-s cs(i)) s Ns is shrinking exponentially (depends on ´)

Normalizers are sums of weights; At start of each round these sum to 1 “more” decrease (because the base learner is good) than increase More weight has the exponent shrink than otherwise

i exp (-s cs(i)) = i exp (- yis As(i)) This is an upper bound on # of incorrectly classified examples:

If yi ≠ sign[s As(i)] ( = majority{A1(i), A2(i),…}),

then yi s As(i) < 0, so exp(-yi s As(i)) ≥ 1.

Therefore, the number of incorrectly classified examples is exponentially small in t


Base Learner

S: Labeled examples from D

A1, A2, …

Update D


Combine A1, A2, …majority

Initially:D uniform on DB rows

Does well on ½ + ´ of D



Private Boosting for People Base learner must be differentially private Main concern is rows whose weight grows too large

Affects termination test, sampling, re-normalizing Similar to problem arising when learning in the presence of noise Similar solution: smooth boosting

Remove (give up on) elements that become too heavy Carefully! Removing one heavy element and re-normalizing may

cause another element to become heavy… Ensure this is rare (else give up on too many elements; hurt accuracy)

Iterative Smoothing Not today.

Boosting for Queries? Goal: Given database DB and a set Q of low-sensitivity queries,

produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB).

Assume existence of (²0, ±0)-dp Base Learner producing an

object O that does well on more than half of D Pr q » D [ |q(O) – q(DB)| < ¸ ] > (1/2 + ´)

Base Learner

S: Labeled examples from D

A1, A2, …

Update D


Combine A1, A2, …

Initially:D uniform on Q

Does well on ½ + ´ of D


Base Learner

S: Labeled examples from D

A1, A2, …

Update D


Combine A1, A2, …median

Initially:D uniform on Q

Does well on ½ + ´ of D


Individual can affectmany queries at once!


Privacy is Problematic In smooth boosting for people, at each round an individual has

only a small effect on the probability distribution In boosting for queries, an individual can affect the quality of

q(At) simultaneously for many q As time progresses, distributions on neighboring databases could

evolve completely differently, yielding very different A t’s Slightly ameliorated by sampling (if only a few samples, maybe can

avoid the q’s on the edge?)

How can we make the re-weighting less sensitive?

Private Boosting for Queries [Variant of AdaBoost] Initial distribution D is uniform on queries in Q S is always a set of k elements drawn from Qk

Combiner is median [viz. Freund92] Weight update for queries

If very well approximated by At, decrease weight by factor of e (“-1”)

If very poorly approximated by At, increase weight by factor of e (“+1”) In between, scale with distance of midpoint (down or up):

2 ( |q(DB) – q(At)| - (¸ + ¹/2) ) / ¹ (sensitivity 2½/¹)


Theorem (minus some parameters) Let all q 2 Q have sensitivity · ½. Run the query-boost algorithm for T = log | Q |/´2 rounds


¹ = ((log | Q |/´2 )2 ½ √k ) / ²

The resulting object Q is ( (² + T²0), T±0) )-dp and, whp, gives (¸+¹)-accurate answers to all the queries in Q .

Better privacy (small ²) gives worse utility (larger ¹) Better base learner (smaller k, larger ´) helps

Proving Privacy Technique #1: Pay Your Debt and Move On

Fix A1, A2, …, At (record D vs D’ confidence gain) “Pay Your Debt” Focus on gain in selection of S 2 Q k in round t+1 “Move On”

Based on distributions Dt+1 and D’ t+1 determined in round t Will call them D, D’

Technique #2: Evolution of Confidence [DiDwN03] “Delay Payment Until Final Reckoning” Choose q1, q2, …, in turn

For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | Eq »D ln ( D[q ] / D’[q] )|

Prq1,…,qk [| i ln ( D[qi ] / D’[qi] )| > z√k (A + B) + k B] < exp(-z2/2)


Bounding Eq »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0

· Eq » D ln[ D(q)/D’(q) ] · 2A2 (that is, B · 2A2).

KL(D||D’) = q ln[ D(q)/D’(q) ] D(q); always ¸ 0

So, KL(D||D’) · KL(D||D’) + KL(D’||D)= q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ]

· q 0 + |D’(q)-D(q)| A

= A q [ max (D(q),D’(q)) - min (D(q),D’(q)) ]

· A q eA min (D(q),D’(q)) - min (D(q),D’(q))

· A q (eA – 1) min (D(q),D’(q))

· 2A2 when A < 1

Compare DiDwN03

Motivation and Application Boosting for People

Logistic Regression for 3000+ dimensional data Slight twist on CM did pretty well (eps = 1.5) Thought about alternatives

Boosting for Queries Reducing the dependence on the concept class in the work on synthetic databases

in DNRRV09 (Salil’s talk) Over-interpreted the polytime DiNi style attacks (we were spoiled)

Can’t have cn queries with error o(√n) BLR08: can have cn queries with error O(n2/3) DNNRV09: O(n1/2 |Q |o(1)) Now: O(n1/2 log2 |Q |)

Result is more general Only know of base learner for counting queries

Base Learner

S: Labeled examples from D

A1, A2, …

Update D


Combine A1, A2, …

Does well on ½ + ´ of D


top related