private analysis of data sets benny pinkas hp labs, princeton
Post on 21-Dec-2015
216 views
TRANSCRIPT
2
A story
We’re experiencinga lot of fraud lately…
Here too..I can’t find a patternto recognize fraud in advance.. Neither can I..Maybe we should share information..But, what about
• Patients’ privacy• Business secrets
Have you heard of “Securefunction evaluation” ? This is all “theory”.
It can’t be efficient.
3
New Opportunities for Interaction
Between– Enterprises, and government agencies holding
sensitive data. – P2P users– Mobile wireless crowds (PDAs, cell phones)
• What about privacy?• A bidirectional approach:
– Finding what is actually needed– Designing useful and efficient cryptographic
tools
4
Cryptographic Protocols for Privacy Preserving Computation
x y F(x,y) and nothing else
Input:Output:
x yAs if…
F(x,y) F(x,y)
5
Does the trusted party scenario make sense?
x y
F(x,y) F(x,y)• We cannot hope for more privacy• Does the trusted party scenario make sense?
• Are the parties motivated to submit their true inputs?• Can they tolerate the disclosure of F(x,y)?
• If so, we can implement the scenario without a trusted party.
6
Secure Function Evaluation [Yao,GMW,BGW]
x yC(x,y) and nothing else nothing
Input:Output:
• F(x,y) – A public function. • Represented as a Boolean circuit C(x,y).
Implementation:• O(|X|) “oblivious transfers”. O(|C|) communication.• Pretty efficient for small circuits! (but what about larger circuits?)
8
Cryptographic methods vs. randomization methods
inaccuracy
overhead
lack ofprivacy
Randomization methods[statistical disclosure, AS]
Cryptographic methods
Our goal…
9
Examples of Simple Privacy Preserving Primitives (with
reasonable solutions)
• Is X = Y? Is X > Y?• What is X Y? What is median of X Y?• Auctions (negotiations). Many parties,
private bids. Compute the winning bidder and
the sale price, but nothing else. [NPS]• Voting• Add privacy to data mining algs (ID3 – [LP])
11
Applications of Set Intersection
Governmentagency A
Governmentagency B
People on welfare Expensive car buyers
Compute intersectionand nothing else
12
Computing the Intersection
• Private Equality Test (PET)– Alice: x. Bob: y. – Output: 1 iff x=y – Privacy preserving solutions:
• Cannot use hash functions alone• Yao, [FNW], [NP]
• Generalization: list intersection– X = x1, …, xn Y = y1, …, yn
13
The basic tool: Homomorphic Encryption
• Semantically secure public key encryption• Given Enc(M1), ENC(M2), can compute
(without knowing the decryption key)– Enc(M1+M2)– Enc(c· M1) for any constant c.
– I.e. Enc(a0)+Enc(a1)x+…+Enc(an)xn = Enc(P(x))
• Examples: El Gamal, Paillier, DJ.
14
The Scenario
• Client: X = x1, …, xn
• Server: Y = y1, …, yn
• Output: – Client learns X Y. – Server learns nothing.
15
The Protocol
• Client defines a polynomial of degree n whose roots are x1,…,xn
– P(y) = (x1-y)·(x2-y)·…·(xn-y)
= anyn + … + a1y + a0
• Sends to server homomorphic encryptions of coefficients– Enc(an),…, Enc(a0)
• (only the client can decrypt)
16
…The Protocol
• Server uses homomorphic properties to computey Enc( r·P(y) + y) (r is random)
• If yXY result is Enc(r·0+y)=Enc(y), otherwise result is Enc(random).
• Server sends (permuted) results to C.• C decrypts, compares to its list.
17
Security
• Bad server? The server only sees semantically secure encryptions. Learning about C’s input = breaking enc.
• Bad client? The client can, given only the output XY, simulate her “view” in the protocol. (I.e. she generates encryptions of items in XY, and of random items.)
18
Efficiency
• Client encrypts and decrypts n values
• Communication is O(n)• Server:
– For each input computes Enc(r·P(y)+y), i.e. n exponentiations.
– Total O(n2) exponentiations– Can use hashing to reduce overhead
to O(n lnln n).
19
Is Approximation easier?• Can we approximate size of intersection
(i.e. scalar product) with sublinear overhead?
• Lower bound: – Approximating |XY| within 1 ε factor
requires Ω(n) communication (constant ε).– True even for randomized algorithms.– Proof: reduction to Razborov’s lower bound for
Disjointness.
• Upper bound: protocols with matching overhead.
21
Secure Computation of the Kth-ranked element
• Inputs:– A: SA B: SB
– Large sets of unique items (D).– There’s also the multi-party scenario
• Output: x SA SB
s.t. |y | y<x, ySASB| = k-1
• Median: k = (|SA| + |SB|) / 2
22
Motivation
• Basic statistical analysis of distributed data
• E.g. histogram of salaries in competing business in the same area
• Sometimes the parties might want to hide the size of their inputs
23
Some information is always revealed
• The Kth-ranked element reveals some information
• Suppose SA = x1,…,x1000
– Median of SA SB = x400
• Party A now learns that SB contains at least 200 elements smaller than x400
• But she shouldn’t learn more
24
Results, and previous work
• Previous work: generic constructions – overhead at least linear in k.
• New results:– Two-party: log k secure comparisons of
log D bit numbers.– Multi-party: log D simple computations
with log D bit numbers.
25
RA
An (insecure) two-party median protocol
LASA
SB
mA
RBLB mB
LA lies below the median, RB lies above the median.
New median is same as original median.Recursion Need log n rounds
(suppose each set contains 2i items)
mA < mB
26
Secure two-party median protocol
A finds median of SA, call it mA
B finds median of SB, call it mB
mA < mB
A deletes xєSA
s.t. x < mA.B deletes xєSB
s.t. x > mB.
A deletes xєSA
s.t. x > mA.B deletes xєSB
s.t. x < mB.
YES
NO
Secure comparison(e.g. a small circuit)
27
Proof of security• Simulation: Given the protocol’s output, each party
can simulate the execution of the protocol
SA
medianFirst comparison: mA < mB
Second comparison: mA > mB
28
Arbitrary inputs, arbitrary k
SA
SB
K
+
Now, compute the median of two sets of size k
Size should be a power of 2
2i
-
++
median of new inputs = kth element of original inputs
29
Conclusions• Efficient privacy preserving primitives for
basic tasks• Open problems
– Intersection: approximate matching?– Median: clustering?
• Theory and applications can and should interact– Tools from the theory of cryptography (e.g.
SFE) can be used in applications– Applications can benefit from rigorous analysis
• There’s a lot more to be done…