privacy-maxent: integrating background knowledge in privacy quantification wenliang (kevin) du,...
Post on 19-Dec-2015
215 views
TRANSCRIPT
![Page 1: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/1.jpg)
Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification
Wenliang (Kevin) Du,
Zhouxuan Teng,
and Zutao Zhu.Department of Electrical Engineering & Computer Science
Syracuse University, Syracuse, New York.
![Page 2: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/2.jpg)
Introduction Privacy-Preserving Data Publishing. The impact of background knowledge:
How does it affect privacy? How to measure its impact on privacy?
Integrate background knowledge in privacy quantification. Privacy-MaxEnt: A systematic approach. Based on well-established theories.
Evaluation.
![Page 3: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/3.jpg)
Privacy-Preserving Data Publishing Data disguise methods
Randomization Generalization (e.g. Mondrian) Bucketization (e.g. Anatomy)
Our Privacy-MaxEnt method can be applied to Generalization and Bucketization. We pick Bucketization in our presentation.
![Page 4: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/4.jpg)
Data Sets
Identifier Quasi-Identifier (QI) Sensitive Attribute (SA)
![Page 5: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/5.jpg)
Bucketized Data
P( Breast cancer | {female, college}, bucket=1 ) = 1/4P( Breast cancer | {female, junior}, bucket=2 ) = 1/3
Quasi-Identifier (QI) Sensitive Attribute (SA)
![Page 6: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/6.jpg)
Impact of Background Knowledge
Background Knowledge:
It’s rare for male to have breast cancer.
This analysis is hard for large data sets.
![Page 7: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/7.jpg)
Previous Studies Martin, et al. ICDE’07.
First formal study on background knowledge Chen, LeFevre, Ramakrishnan. VLDB’07.
Improves the previous work. They deal with rule-based knowledge.
Deterministic knowledge. Background knowledge can be much more
complicated. Uncertain knowledge
![Page 8: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/8.jpg)
Complicated Background Knowledge Rule-based knowledge:
P (s | q) = 1. P (s | q) = 0.
Probability-Based Knowledge P (s | q) = 0.2. P (s | Alice) = 0.2.
Vague background knowledge 0.3 ≤ P (s | q) ≤ 0.5.
Miscellaneous types P (s | q1) + P (s | q2) = 0.7 One of Alice and Bob has “Lung Cancer”.
![Page 9: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/9.jpg)
Challenges How to analyze privacy in a systematic way
for large data sets and complicated background knowledge?
Directly computing P( S | Q ) is hard.
What do we want to compute? P( S | Q ), given the background knowledge and
the published data set. P(S | Q ) is primitive for most privacy metrics.
![Page 10: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/10.jpg)
Our Approach
BackgroundKnowledge
Published Data
Public Information
Constraintson x
Constraintson x
Solve x
Consider P( S | Q ) as variable x (a vector).
Most unbiased solution
![Page 11: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/11.jpg)
Maximum Entropy Principle “Information theory provides a constructive
criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum entropy estimate. It is least biased estimate possible on the given information.” — by E. T. Jaynes, 1957.
![Page 12: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/12.jpg)
The MaxEnt Approach
BackgroundKnowledge
Published Data
Public Information
Constraintson P( S | Q )
Constraintson P( S | Q )
Estimate P( S | Q )
Maximum Entropy Estimate
![Page 13: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/13.jpg)
Entropy
Because H(S | Q, B) = H(Q, S, B) – H(Q, B)
Constraint should use P(Q, S, B) as variables
BSQ
BQSPBQSPBQPBQSH,,
).,|(log),|(),(),|( :Entropy
BSQ
BSQPBSQPBSQH,,
).,,(log),,(),,( :Entropy
![Page 14: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/14.jpg)
Maximum Entropy Estimate
Let vector x = P(Q, S, B). Find the value for x that maximizes its
entropy H(Q, S, B), while satisfying h1(x) = c1, …, hu(x) = cu : equality constraints
g1(x) ≤ d1, …, gv(x) ≤ dv : inequality constraints
A special case of Non-Linear Programming.
![Page 15: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/15.jpg)
Constraints from Knowledge
Linear model: quite generic. Conditional probability:
P (S | Q) = P(Q, S) / P(Q). Background knowledge has nothing to do with B:
P(Q, S) = P(Q, S, B=1) + … + P(Q, S, B=m).
Background Knowledge
Constraintson P(Q, S, B)
![Page 16: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/16.jpg)
Constraints from Published Data
Constraints Truth and only the truth. Absolutely correct for the original data set. No inference.
Published Data SetD’
Constraintson P(Q, S, B)
![Page 17: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/17.jpg)
Assignment and Constraints
Observation: the original data is one of the assignmentsConstraint: true for all possible assignments
![Page 18: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/18.jpg)
QI Constraint
Constraint:
Example:
),(),,(1
bqPbsqP j
h
j
2.0)1,()1,,()1,,()1,,( 1312111 qPsqPsqPsqP
![Page 19: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/19.jpg)
SA Constraint
Constraint:
Example:
),(),,(1
bsPbsqPg
ii
P(q1,s4 ,2) P(q3,s4,2) P(q4,s4 ,2) P(s4 ,2) 0.1
![Page 20: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/20.jpg)
Zero Constraint P(q, s, b) = 0, if q or s does not appear in
Bucket b. We can reduce the number of variables.
![Page 21: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/21.jpg)
Theoretic Properties Soundness: Are they correct?
Easy to prove. Completeness: Have we missed any constraint?
See our theorems and proofs. Conciseness: Are there redundant constraints?
Only one redundant constraint in each bucket. Consistency: Is our approach consistent with the
existing methods (i.e., when background knowledge is Ø).
![Page 22: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/22.jpg)
Completeness w.r.t Equations Have we missed any equality constraint?
Yes! If F1 = C1 and F2 = C2 are constraints, F1 + F2 = C1
+ C2 is too. However, it is redundant.
Completeness Theorem: U: our constraint set. All linear constraints can be written as the linear
combinations of the constraints in U.
![Page 23: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/23.jpg)
Completeness w.r.t Inequalities Have we missed any inequalities constraint?
Yes! If F = C, then F ≤ C+0.2 is also valid (redundant).
Completeness Theorem: Our constraint set is also complete in the
inequality sense.
![Page 24: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/24.jpg)
Putting Them Together
BackgroundKnowledge
Published Data
Public Information
Constraintson P( S | Q )
Constraintson P( S | Q )
Estimate P( S | Q )
Maximum Entropy Estimate
Tools: LBFGS, TOMLAB, KNITRO, etc.
![Page 25: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/25.jpg)
Inevitable Questions:
Where do we get background knowledge? Do we have to be very very knowledgeable? For P (s | q) type of knowledge:
All useful knowledge is in the original data set. Association rules:
Positive: Q S Negative: Q ¬S, ¬Q S, ¬Q ¬S
Bound the knowledge in our study. Top-K strongest association rules.
![Page 26: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/26.jpg)
Knowledge about Individuals
Knowledge 1: Alice has either s1 or s4.
Constraint:
Knowledge 1: Two people among Alice, Bob, and Charlie have s4.
Constraint:
Alice: (i1, q1)Bob: (i4, q2)Charlie: (i9, q5)
NqipsqiPsqiPsqiP 111411111111 ),()2,,,()2,,,()1,,,(
NsqiPsqiPsqiP 2459424411 )3,,,()3,,,()2,,,(
![Page 27: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/27.jpg)
Evaluation Implementation:
Lagrange multipliers: Constrained Optimization Unconstrained Optimization
LBFGS: solving the unconstrained optimization problem.
Pentium 3Ghz CPU with 4GB memory.
![Page 28: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/28.jpg)
Privacy versus KnowledgeEstimation Accuracy: KL Distance between P(MaxEnt) (S | Q) and P(Original) (S | Q).
![Page 29: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/29.jpg)
Privacy versus # of QI attributes
![Page 30: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/30.jpg)
Performance vs. Knowledge
![Page 31: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/31.jpg)
Running Time vs. Data Size
![Page 32: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/32.jpg)
Iteration vs. Data size
![Page 33: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d375503460f94a104bd/html5/thumbnails/33.jpg)
Conclusion Privacy-MaxEnt is a systematic method
Model various types of knowledge Model the information from the published data Based on well-established theory.
Future work Reducing the # of constraints Vague background knowledge Background knowledge about individuals