differential privacy xintao wu oct 31, 2012

30
Differential Privacy Xintao Wu Oct 31, 2012

Upload: kyle

Post on 13-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Differential Privacy Xintao Wu Oct 31, 2012. Sanitization approaches. Input perturbation Add noise to data Generalize data Summary statistics Means, variances Marginal totals Model parameters Output perturbation Add noise to summary statistics. Blending/hiding into a crowd. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Differential Privacy Xintao Wu Oct 31, 2012

Differential Privacy

Xintao WuOct 31, 2012

Page 2: Differential Privacy Xintao Wu Oct 31, 2012

Sanitization approaches

• Input perturbation– Add noise to data– Generalize data

• Summary statistics– Means, variances– Marginal totals– Model parameters

• Output perturbation– Add noise to summary statistics

Page 3: Differential Privacy Xintao Wu Oct 31, 2012

Blending/hiding into a crowd

• K-anonymity based approaches

• Adversary may have various background knowledge to breach privacy

• Privacy models often assume “the adversary’s background knowledge is given”

Page 4: Differential Privacy Xintao Wu Oct 31, 2012

Classic intuition for privacy

• Privacy means that anything can be learned about a respondent from the statistical database can be learned without access to the database.

• Security of encryption– Anything about the plaintext that can be learned

from a ciphertext can be learned without the ciphertext.

• Prior and posterior views about an individual should not change much

Page 5: Differential Privacy Xintao Wu Oct 31, 2012

Motivation

• Publicly release statistical information about a dataset without compromising the privacy of any individual

Page 6: Differential Privacy Xintao Wu Oct 31, 2012

Requirement

• Anything that can be learned about a respondent from a statistical database should be learnable without access to the database

• Reduce the knowledge gain of joining the database

• Require that the probability distribution on the public results is essentially the same independent of whether any individual opts in to, or opts out of the dataset

Page 7: Differential Privacy Xintao Wu Oct 31, 2012

Definition

Page 8: Differential Privacy Xintao Wu Oct 31, 2012

Sensitivity function

• Captures how great a difference must be hidden by the additive noise

Page 10: Differential Privacy Xintao Wu Oct 31, 2012

Guassian noise

Page 11: Differential Privacy Xintao Wu Oct 31, 2012

Adding LAP noise

Page 12: Differential Privacy Xintao Wu Oct 31, 2012

Proof sketch

Page 13: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1, epsilon varies

Page 14: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=0.01

Page 15: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=0.1

Page 16: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=1

Page 17: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=2

Page 18: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=10

Page 19: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=2, epsilon varies

Page 20: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=3, epsilon varies

Page 21: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=10000, epsilon varies

Page 22: Differential Privacy Xintao Wu Oct 31, 2012

Composition

• Sequential composition

• Parallel composition --for disjoint sets, the ultimate privacy

guarantee depends only on the worst of the guarantees of each analysis, not the sum.

Page 23: Differential Privacy Xintao Wu Oct 31, 2012

Example

• Let us assume a table with 1000 customers and each record has attributes: name, gender, city, cancer, salary. – For attribute city, we assume the domain size is 10;

– for attribute cancer, we only record Yes or No for each customer;

– for attribute salary, the domain range is 0-10k.

– The privacy threshold \epsilon is a constant 0.1 set by data owner.

• For one single query “How many customers got cancer?”

 • The adversary is allowed to ask three times of the query

shown the above.

 

Page 24: Differential Privacy Xintao Wu Oct 31, 2012

Example (continued)

• “How many customers got cancer in each city?”

• For one single query “What is the sum of salaries across all customers?”

Page 25: Differential Privacy Xintao Wu Oct 31, 2012

Type of computing (query)

• some are very sensitive, others are not

• single query vs. query sequence

• query on disjoint sets or not

• outcome expected: number vs. arbitrary

• interactive vs. not interactive

Page 26: Differential Privacy Xintao Wu Oct 31, 2012

Sensitivity

• Global sensitivity

• Local sensitivity

• Smooth sensitivity

Page 27: Differential Privacy Xintao Wu Oct 31, 2012

Different areas of DP

• PINQ

• DM with DP

• Optimizing linear counting queries under differential privacy.

-Matrix mechanism for answering a workload of predicate counting queries

Page 28: Differential Privacy Xintao Wu Oct 31, 2012

PPDM interface--PINQ

• A programmable privacy preserving layer

• Add calibrated noise to each query

• Need to assign privacy cost budget

Page 29: Differential Privacy Xintao Wu Oct 31, 2012

Data Mining with DP

• Previous study—privacy preserving interface ensures everything about DP

• Problems—inferior results if the interface is utilized simply during data mining

• Solution—consider both together• DP ID3

—noisy count

—evaluate all attributes in one exponential mechanism query using entire budget instead of splitting budget among multiple

Page 30: Differential Privacy Xintao Wu Oct 31, 2012

DP in Social Networks

• Page 97-120 of pakdd11 tutorial