aaron roth, associate professor, university of pennsylvania, at mlconf nyc 2017
TRANSCRIPT
![Page 1: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/1.jpg)
Differential PrivacyAnd Machine Learning
Aaron Roth
March 24, 2017
![Page 2: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/2.jpg)
Protecting Privacy is Important
Class action lawsuit accuses AOL of violating the Electronic Communications Privacy Act, seeks $5,000 in damages per user. AOL’s director of research is fired.
![Page 3: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/3.jpg)
Protecting Privacy is Important
Class action lawsuit (Doe v. Netflix) accuses Netflix of violating the Video Privacy Protection Act, seeks $2,000 in compensation for each of Netflix’s 2,000,000 subscribers. Settled for undisclosed sum, 2nd Netflix Challenge is cancelled.
![Page 4: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/4.jpg)
Protecting Privacy is Important
The National Human Genome Research Institute (NHGRI) immediately restricted pooled genomic data that had previously been publically available.
![Page 5: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/5.jpg)
So What is Differential Privacy?
![Page 6: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/6.jpg)
So What is Differential Privacy?
• Differential Privacy is about promising people freedom from harm. “An analysis of a dataset D is differentially private if the data analyst knows almost no more about Alice after the analysis than he would have known had he conducted the same analysis on an identical data set with Alice’s data removed.”
![Page 7: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/7.jpg)
D
Differential Privacy[Dwork-McSherry-Nissim-Smith 06]
Algorithm
Pr [r]
ratio bounded
Alice Bob Chris Donna ErnieXavier
![Page 8: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/8.jpg)
Differential Privacy
: The data universe. : The dataset (one element per person)
Definition: Two datasets are neighbors if they differ in the data of a single individual
![Page 9: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/9.jpg)
Differential Privacy
: The data universe. : The dataset (one element per person)
Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :
![Page 10: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/10.jpg)
Some Useful Properties
Theorem (Postprocessing): If is -private, and is any (randomized) function, then is -private.
![Page 11: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/11.jpg)
So…
𝑥=¿
Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :
![Page 12: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/12.jpg)
So…
𝑥=¿
Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :
![Page 13: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/13.jpg)
So…
𝑥=¿
Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :
![Page 14: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/14.jpg)
Some Useful Properties
Theorem (Composition): If are -private, then:
is -private.
![Page 15: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/15.jpg)
So…
You can go about designing algorithms as you normally would. Just access the data using differentially private “subroutines”, and
keep track of your “privacy budget” as a resource. Private algorithm design, like regular algorithm design, can be
modular.
![Page 16: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/16.jpg)
Some simple operations:Answering Numeric Queries
Def: A numeric function has sensitivity if for all neighboring :
Write • e.g. “How many software engineers are in the room?” has
sensitivity 1. • “What fraction of people in the room are software engineers?”
has sensitivity .
![Page 17: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/17.jpg)
Some simple operations:Answering Numeric Queries
The Laplace Mechanism:
Theorem: is -private.
![Page 18: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/18.jpg)
Some simple operations:Answering Numeric Queries
The Laplace Mechanism:
Theorem: The expected error is (can answer “what fraction of people in the room are software
engineers?” with error
![Page 19: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/19.jpg)
Some simple operations:Answering Non-numeric Queries
“What is the modal eye color in the room?”
• If you can define a function that determines how “good” each outcome is for a fixed input:– E.g.
“fraction of people in D with red eyes”
![Page 20: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/20.jpg)
Some simple operations:Answering Non-numeric Queries
: Output w.p.
Theorem: is -private, and outputs such that:
(can find a color that has frequency within of the modal color in the room)
![Page 21: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/21.jpg)
So what can we do with that?
Empirical Risk Minimization:*i.e. almost all of supervised learning
Find to minimize:
![Page 22: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/22.jpg)
So what can we do with that?
Empirical Risk Minimization:Simple Special Case: Linear Regression
Find to minimize:
![Page 23: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/23.jpg)
So what can we do with that?
• First Attempt: Try the exponential mechanism!• Define • Output w.p. – How?
• In this case, reduces to:Output Where
![Page 24: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/24.jpg)
So what can we do with that?
Empirical Risk Minimization:The General Case (e.g. deep learning)
Find to minimize:
![Page 25: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/25.jpg)
Stochastic Gradient Descent
Convergence depends on the fact that at each round:
Let For to :
Pick at random. Let Let
![Page 26: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/26.jpg)
Private Stochastic Gradient Descent
Still have: (Can still prove convergence theorems, and run the algorithm…)
Privacy guarantees can be computed from:1) The privacy of the Laplace mechanism
2) Preservation of privacy under post-processing, and3) Composition of privacy guarantees.
Let For to :
Pick at random. Let Let
![Page 27: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/27.jpg)
What else can we do?• Statistical Estimation• Graph Analysis• Combinatorial Optimization• Spectral Analysis of Matrices• Anomaly Detection/Analysis of Data Streams• Convex Optimization• Equilibrium computation• Computation of optimal 1-sided and 2-sided matchings• Pareto Optimal Exchanges• …
![Page 28: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/28.jpg)
Differential Privacy Learning
Theorem*: An -differentially private algorithm cannot overfit its training set by more than .
*Lots of interesting details missing! See our paper in Science.
![Page 29: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/29.jpg)
from numpy import *
def Thresholdout(sample, holdout, q, sigma, threshold): sample_mean = mean([q(x) for x in sample]) holdout_mean = mean([q(x) for x in holdout])if (abs(sample_mean - holdout_mean) < random.normal(threshold, sigma)): # q does not overfit return sample_mean else: # q overfits return holdout_mean + random.normal(0, sigma)
thresholdout.py:
Thresholdout [DFHPRR15]
![Page 30: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/30.jpg)
Reusable holdout example
• Data set with 2n = 20,000 rows and d = 10,000 variables. Class labels in {-1,1}
• Analyst performs stepwise variable selection:1. Split data into training/holdout of size n2. Select “best” k variables on training data3. Only use variables also good on holdout4. Build linear predictor out of k variables5. Find best k = 10,20,30,…
![Page 31: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/31.jpg)
Classification after feature selection
No signal: data are random gaussians labels are drawn independently at random from {-1,1}
Thresholdout correctly detects overfitting!
![Page 32: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/32.jpg)
Strong signal: 20 features are mildly correlated with targetremaining attributes are uncorrelated
Thresholdout correctly detects right model size!
Classification after feature selection
![Page 33: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/33.jpg)
So…
• Differential privacy provides:– A rigorous, provable guarantee with a strong privacy semantics. – A set of tools and composition theorems that allow for modular, easy
design of privacy preserving algorithms.– Protection against overfitting even when privacy is not a concern.
![Page 34: Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017](https://reader035.vdocument.in/reader035/viewer/2022070600/58e4a4541a28aba3458b6849/html5/thumbnails/34.jpg)
Thanks!
To learn more:• Our textbook on differential privacy:– Available for free on my website: http://www.cis.upenn.edu/~aaroth
• Connections between Privacy and Overfitting:– Dwork, Feldman, Hardt, Pitassi, Reingold, Roth, “The Reusable Holdout: Preserving Validity in Adaptive Data Analysis”,
Science, August 7 2015.– Dwork, Feldman, Hardt, Pitassi, Reingold, Roth, “Preserving Statistical Validity in Adaptive Data Analysis”, STOC 2015. – Bassily, Nissim, Stemmer, Smith, Steinke, Ullman, “Algorithmic Stability for Adaptive Data Analysis”, STOC 2016.– Rogers, Roth, Smith, Thakkar, “Max Information, Differential Privacy, and Post-Selection Hypothesis Testing”, FOCS 2016.– Cummings, Ligett, Nissim, Roth, Wu, “Adaptive Learning with Robust Generalization Guarantees”, COLT 2016.