boosting and differential privacy cynthia dwork, microsoft research texpoint fonts used in emf. read...
TRANSCRIPT
Boosting and Differential Privacy
Cynthia Dwork, Microsoft Research
The Power of Small, Private, Miracles
Joint work with Guy Rothblum and Salil Vadhan
Boosting [Schapire, 1989] General method for improving accuracy of any given learning
algorithm
Example: Learning to recognize spam e-mail “Base learner” receives labeled examples, outputs heuristic
Labels are {+1, -1} Run many times; combine the resulting heuristics
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Does well on ½ + ´ of D
Terminate?
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Does well on ½ + ´ of D
How? Terminate?
Boosting for People [Variant of AdaBoost, FS95] Initial distribution D is uniform on database rows S is always a subset of k elements drawn from Dk
Combiner is majority Weight update:
If correctly classified by current A, decrease weight by factor of e “subtract 1 from exponent”
If incorrectly classified by current A, increase weight by factor of e “add 1 to exponent”
Re-normalize to obtain updated D
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
s Ns = (1/m) i exp (-s cs(i)) s Ns is shrinking exponentially (depends on ´)
Normalizers are sums of weights; At start of each round these sum to 1 “more” decrease (because the base learner is good) than increase More weight has the exponent shrink than otherwise
i exp (-s cs(i)) = i exp (- yis As(i)) This is an upper bound on # of incorrectly classified examples:
If yi ≠ sign[s As(i)] ( = majority{A1(i), A2(i),…}),
then yi s As(i) < 0, so exp(-yi s As(i)) ≥ 1.
Therefore, the number of incorrectly classified examples is exponentially small in t
-1/+1renormalize
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …majority
Initially:D uniform on DB rows
Does well on ½ + ´ of D
Privacy?
Terminate?
Private Boosting for People Base learner must be differentially private Main concern is rows whose weight grows too large
Affects termination test, sampling, re-normalizing Similar to problem arising when learning in the presence of noise Similar solution: smooth boosting
Remove (give up on) elements that become too heavy Carefully! Removing one heavy element and re-normalizing may
cause another element to become heavy… Ensure this is rare (else give up on too many elements; hurt accuracy)
Iterative Smoothing Not today.
Boosting for Queries? Goal: Given database DB and a set Q of low-sensitivity queries,
produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB).
Assume existence of (²0, ±0)-dp Base Learner producing an
object O that does well on more than half of D Pr q » D [ |q(O) – q(DB)| < ¸ ] > (1/2 + ´)
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Initially:D uniform on Q
Does well on ½ + ´ of D
-1/+1renormalize
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …median
Initially:D uniform on Q
Does well on ½ + ´ of D
Privacy?
Individual can affectmany queries at once!
Terminate?
Privacy is Problematic In smooth boosting for people, at each round an individual has
only a small effect on the probability distribution In boosting for queries, an individual can affect the quality of
q(At) simultaneously for many q As time progresses, distributions on neighboring databases could
evolve completely differently, yielding very different A t’s Slightly ameliorated by sampling (if only a few samples, maybe can
avoid the q’s on the edge?)
How can we make the re-weighting less sensitive?
Private Boosting for Queries [Variant of AdaBoost] Initial distribution D is uniform on queries in Q S is always a set of k elements drawn from Qk
Combiner is median [viz. Freund92] Weight update for queries
If very well approximated by At, decrease weight by factor of e (“-1”)
If very poorly approximated by At, increase weight by factor of e (“+1”) In between, scale with distance of midpoint (down or up):
2 ( |q(DB) – q(At)| - (¸ + ¹/2) ) / ¹ (sensitivity 2½/¹)
+
Theorem (minus some parameters) Let all q 2 Q have sensitivity · ½. Run the query-boost algorithm for T = log | Q |/´2 rounds
with
¹ = ((log | Q |/´2 )2 ½ √k ) / ²
The resulting object Q is ( (² + T²0), T±0) )-dp and, whp, gives (¸+¹)-accurate answers to all the queries in Q .
Better privacy (small ²) gives worse utility (larger ¹) Better base learner (smaller k, larger ´) helps
Proving Privacy Technique #1: Pay Your Debt and Move On
Fix A1, A2, …, At (record D vs D’ confidence gain) “Pay Your Debt” Focus on gain in selection of S 2 Q k in round t+1 “Move On”
Based on distributions Dt+1 and D’ t+1 determined in round t Will call them D, D’
Technique #2: Evolution of Confidence [DiDwN03] “Delay Payment Until Final Reckoning” Choose q1, q2, …, in turn
For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | Eq »D ln ( D[q ] / D’[q] )|
Prq1,…,qk [| i ln ( D[qi ] / D’[qi] )| > z√k (A + B) + k B] < exp(-z2/2)
AB
Bounding Eq »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0
· Eq » D ln[ D(q)/D’(q) ] · 2A2 (that is, B · 2A2).
KL(D||D’) = q ln[ D(q)/D’(q) ] D(q); always ¸ 0
So, KL(D||D’) · KL(D||D’) + KL(D’||D)= q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ]
· q 0 + |D’(q)-D(q)| A
= A q [ max (D(q),D’(q)) - min (D(q),D’(q)) ]
· A q eA min (D(q),D’(q)) - min (D(q),D’(q))
· A q (eA – 1) min (D(q),D’(q))
· 2A2 when A < 1
Compare DiDwN03
Motivation and Application Boosting for People
Logistic Regression for 3000+ dimensional data Slight twist on CM did pretty well (eps = 1.5) Thought about alternatives
Boosting for Queries Reducing the dependence on the concept class in the work on synthetic databases
in DNRRV09 (Salil’s talk) Over-interpreted the polytime DiNi style attacks (we were spoiled)
Can’t have cn queries with error o(√n) BLR08: can have cn queries with error O(n2/3) DNNRV09: O(n1/2 |Q |o(1)) Now: O(n1/2 log2 |Q |)
Result is more general Only know of base learner for counting queries
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Does well on ½ + ´ of D
Terminate?