p4p: a practical framework for privacy- preserving distributed computation yitao duan (advisor prof....
Post on 19-Dec-2015
224 views
TRANSCRIPT
![Page 1: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/1.jpg)
P4P: A Practical Framework for Privacy-Preserving Distributed Computation
Yitao Duan (Advisor Prof. John Canny)http://www.cs.berkeley.edu/~duan
Berkeley Institute of DesignComputer Science Division
University of California, Berkeley11/27/2007
![Page 2: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/2.jpg)
Research Goal To provide practical solutions with
provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
![Page 3: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/3.jpg)
Research Goal To provide practical solutions with
provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
![Page 4: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/4.jpg)
Model
……u1
d1
u2
d2
un-1
dn-1
un
dn
fChallenge: standard cryptographic tools not feasible at large scale
Must be obfuscated
![Page 5: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/5.jpg)
A Practical Solution Provable privacy: Cryptography Efficiency: Minimize the number of
expensive primitives and rely on probabilistic guarantee
Realistic adversary model: Must handle malicious users who may try to bias the computation by inputting invalid data
![Page 6: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/6.jpg)
Basic Approach
……u1
d1
u2
d2
un-1
dn-1
un
dn
Σ
gj(di
)gj(dn
)gj(d2
)gj(dn-1)
Cryptographic privacy
f =
di in D, gj: j = 1, 2, …, m
No leakage beyond final result for many algorithms
![Page 7: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/7.jpg)
The Power of Addition A large number of popular algorithms can be
run with addition-only steps
Linear algorithms: voting and summation, nonlinear algorithm: regression, classification, SVD, PCA, k-means, ID3, EM etc
All algorithms in the statistical query model [Kearns 93]
Many other gradient-based numerical algorithms
Addition-only framework has very efficient private implementation in cryptography and admits efficient ZKPs
![Page 8: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/8.jpg)
Peers for Privacy: The Nomenclature
Privacy is a right that one must fight for. Some agents must act on behalf of user’s privacy in the computation. We call them privacy peers
Our method aggregates across many user data. We can prove that the aggregation provides privacy: the data from the peers protects each other
![Page 9: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/9.jpg)
Private Addition – P4P Style
The computation: secret sharing over small field
Malicious users: efficient zero-knowledge proof to bound the L2-norm of the user vector
![Page 10: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/10.jpg)
Big Integers vs. Small Ones Most applications work with “regular-sized”
integers (e.g. 32- or 64-bit). Arithmetic operations are very fast when each operand fits into a single memory cell (~10-9 sec)
Public-key operations (e.g. used in encryption and verification) must use keys with sufficient length (e.g. 1024-bit) for security. Existing private computation solutions must work with large integers extensively (~10-3 sec)
A 6 orders of magnitude difference!
![Page 11: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/11.jpg)
Private Arithmetic: Two Paradigms
Homomorphism: User data is encrypted with a public key cryptosystem. Arithmetic on this data mirrors arithmetic on the original data, but the server cannot decrypt partial results.
Secret-sharing: User sends shares of their data to several servers, so that no small group of servers gains any information about it.
![Page 12: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/12.jpg)
Arithmetic: Homomorphism vs VSS
Homomorphism+ Can tolerate t < n corrupted players as far as
privacy is concerned - Use public key crypto, works with large fields
(e.g. 1024-bit), 10,000x more expensive than normal arithmetic (even for addition)
Secret sharing+ Addition is essentially free. Can use any size field - Can’t do two party multiplication - Most schemes also use public key crypto for
verification - Doesn’t fit well into existing service architecture
![Page 13: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/13.jpg)
P4P: Peers for Privacy Some parties, called Privacy Peers, actively
participate in the computation, working for users’ privacy
Privacy peers provide privacy when they are available, but cant access data themselves
P
P
UU
U
U
U
U
S
PeerGroup
![Page 14: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/14.jpg)
P4P The server provides data archival,
and synchronizes the protocol Server only communicates with
privacy peers occasionally (2AM)
P
P
UU
U
U
U
U
S
PeerGroup
![Page 15: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/15.jpg)
Privacy Peers Roles of privacy peers
Anonymizing communication Sharing information Participating in computation Others infrastructure support
They work on behalf of users privacy
But we need a higher level of trust on privacy peers
![Page 16: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/16.jpg)
Candidates for Privacy Peers
Some players are more trustworthy than others
In workspace, a union representative In a community, a few members with good
reputation Or a third party commercial provider A very important source of security and
efficiency The key is that privacy peers should
have different incentives from the server, a mutual distrust between them
![Page 17: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/17.jpg)
Security from Heterogeneity Server is secure against outside attacks and won’t
actively cheat Companies spend $$$ to protect their servers The server often holds much more valuable info
than what the protocol reveals Server benefits from accurate computation
Privacy peers won’t collude with the server Interests conflicts, mutual distrust, laws Server can’t trust clients can keep conspiracy secret
Users can actively cheat Rely on server for protection against outside
attacks, privacy peers for defending against a curious server
![Page 18: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/18.jpg)
viui
ui + vi = di
Private Addition
di: user i’s private vector. ui,,vi and di are all in a small integer field
![Page 19: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/19.jpg)
μ = Σui ν = Σvi
ui + vi = di
Private Addition
![Page 20: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/20.jpg)
ui + vi = di
μ = Σui ν = Σvi
μ
ν
Private Addition
![Page 21: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/21.jpg)
μ + ν
Private Addition
![Page 22: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/22.jpg)
Provable privacy Computation on both the server and the
privacy peer is over small field: same cost as non-private implementation
Fits existing server-based schemes Server is always online. Users and privacy peers can
be on and off. Only two parties performing the computation, users
just submit their data (and provide a ZK proof, see later)
Extra communication for the server is only with the privacy peer, independent of n
P4P’s Private Addition
![Page 23: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/23.jpg)
The Need for Verification This scheme has a glaring
weakness. Users can use any number in the small field as their data.
Think of a voting scheme: “Please place your vote 0 or 1 in the envelope” Bush 100,000
Gore -100,000
![Page 24: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/24.jpg)
Zero Knowledge Proofs I can prove that I know X without disclosing
what X is. I can prove that a given encrypted number
is a 0. Or I can prove that an encrypted number is a 1.
I can prove that an encrypted number is a ZERO OR ONE, i.e. a bit. (6 extra numbers needed)
I can prove that an encrypted number is a k-bit integer. I need 6k extra numbers to do this (!!!)
![Page 25: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/25.jpg)
An Efficient ZKP of Boundedness
Luckily, we don’t need to prove that every number in a user’s vector is small, only that the vector is small.
The server asks for some random projections of the user’s vector, and expects the user to prove that the square sum of them is small.
• O(log m) public key crypto operations (instead of O(m)) to prove that the L-2 norm of an m-dim vector is smaller than L.
• Running time reduced from hours to seconds.
![Page 26: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/26.jpg)
Bounding the L2-Norm A natural and effective way to restrict a
cheating user’s malicious influence You must have a big vector to produce
large influence on the sum Perturbation theory bounds system
change with norms:|σi(A) - σi(B)| ≤ ||A-B||2 [Weyl]
Can be the basis for other checks Setting L = 1 forces each user to have only 1
vote
![Page 27: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/27.jpg)
Random Projection-basedL2-Norm ZKP Server generates N random m-vectors
in {-1, 0, +1}m
User projects his data to the N directions. provides ZKP that the square sum of the projections < NL2/2
Expensive public key operations are only on the projections and the square sum
![Page 28: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/28.jpg)
Effectiveness
![Page 29: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/29.jpg)
23/4/18
Acceptance/rejection probabilities
(a) Linear and (b) log plots of probability of user input acceptance as a function of |d|/L for N = 50. (b) also includes probability of rejection. In each case, the steepest (jagged curve) is the single-value vector (case 3), the middle curve is Zipf vector (case 2) and the shallow curve is uniform vector (case 1)
![Page 30: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/30.jpg)
Performance Evaluation
(a) Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length.
![Page 31: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/31.jpg)
SVD Singular value decomposition is an
extremely useful tool for a lot of IR and data mining tasks (CF, clustering …)
SVD for a matrix A is a factorization A = UDVT.
If A encodes users x items, then VT gives us the best least-squares approximations to the rows of A in a user-independent way.
ATAV = VD SVD is an eigenproblem
![Page 32: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/32.jpg)
SVD: P4P Style
![Page 33: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/33.jpg)
Experiments: SVD Datasets
Dataset
Dimensions Density Range Type
Enron 150×150 0.0736 [0,1593] Social graph
EM 74424×1648
0.0229 [0, 1.0] Movie ratings
RAND 2000×2000 1.0 [-220, 220]
Random
![Page 34: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/34.jpg)
Results
N: number of iterations. k: number of singular values. ε: relative residual error
![Page 35: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/35.jpg)
Distributed Association Rule Mining
n users, m items. User i has dataset Di
Horizontally partitioned: Di contains the same attributes 1 0 0 0 1 ……
0 0
0 0 1 0 0 …… 1 0
……
D1
Dn
![Page 36: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/36.jpg)
The Market-Basket Model A large set of items, e.g., things
sold in a supermarket. A large set of baskets, each of
which is a small set of the items, e.g., the things one customer buys on one day.
![Page 37: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/37.jpg)
Support Simplest question: find sets of items
that appear “frequently” in the baskets.
Support for itemset I = the number of baskets containing all items in I.
Given a support threshold s, sets of items that appear in > s baskets are called frequent itemsets.
![Page 38: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/38.jpg)
Example Items={milk, coke, pepsi, beer,
juice}. Support = 3 baskets.
B1 = {m, c, b} B2 = {m, p, j}
B3 = {m, b} B4 = {c, j}
B5 = {m, p, b} B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c} Frequent itemsets: {m}, {c}, {b},
{j}, {m, b}, {c, b}, {j, c}.
![Page 39: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/39.jpg)
Association Rules If-then rules about the contents of
baskets. {i1, i2,…,ik} → j means: “if a basket
contains all of i1,…,ik then it is likely to contain j.”
Confidence of this association rule is the probability of j given i1,…,ik.
![Page 40: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/40.jpg)
Step k of apriori-gen in P4P User i constructs an mk-Dimensional vector
in small field (mk: number of candidate itemset at step k)
Use P4P to compute the aggregate (with verification)
The result encodes the supports of all candidate itemsets
![Page 41: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/41.jpg)
Step k of apriori-gen in P4P
1 0 0 0 1 …… 0 0
0 0 1 0 0 …… 1 0
……
D1
Dn
cj: jth candidate itemset
*
+ d1[j]
+ dn[j]
P4P
Support for cj
![Page 42: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/42.jpg)
Analysis Privacy guaranteed by P4P Near optimal efficiency: cost
comparable to that of a direct implementation of the algorithms Main aggregation in small field Only a small number of large field
operations Deal with cheating users with P4P’s
built-in ZK user data verification
![Page 43: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/43.jpg)
Privacy SVD: The intermediate sums are
implied by the final results ATA = VDVT
ARM: Sums treated as public by the applications
Guaranteed privacy regardless data distribution or size
![Page 44: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/44.jpg)
Infrastructure Support Multicast encryption [RSA 06]
Scalable secure bidirectional communication [Infocom 07]
Data protection scheme [PET 04]
![Page 45: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/45.jpg)
P4P: Current Status P4P has been implemented In Java using native code for big
integer Runs on Linux platform Will be made an open-source toolkit
for building privacy-preserving real-world applications.
![Page 46: P4P: A Practical Framework for Privacy- Preserving Distributed Computation Yitao Duan (Advisor Prof. John Canny) duan Berkeley](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d385503460f94a11e34/html5/thumbnails/46.jpg)
Conclusion We can provide strong privacy protection with
little or no cost to a service provider for a broad class of problems in e-commerce and knowledge work.
Responsibility for privacy protection shifts to privacy peers
Within the P4P framework, private computation and many zero-knowledge verifications can be done with great efficiency