private analysis of data sets benny pinkas hp labs, princeton

29
Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Private Analysis of Data Sets

Benny PinkasHP Labs, Princeton

2

A story

We’re experiencinga lot of fraud lately…

Here too..I can’t find a patternto recognize fraud in advance.. Neither can I..Maybe we should share information..But, what about

• Patients’ privacy• Business secrets

Have you heard of “Securefunction evaluation” ? This is all “theory”.

It can’t be efficient.

3

New Opportunities for Interaction

Between– Enterprises, and government agencies holding

sensitive data. – P2P users– Mobile wireless crowds (PDAs, cell phones)

• What about privacy?• A bidirectional approach:

– Finding what is actually needed– Designing useful and efficient cryptographic

tools

4

Cryptographic Protocols for Privacy Preserving Computation

x y F(x,y) and nothing else

Input:Output:

x yAs if…

F(x,y) F(x,y)

5

Does the trusted party scenario make sense?

x y

F(x,y) F(x,y)• We cannot hope for more privacy• Does the trusted party scenario make sense?

• Are the parties motivated to submit their true inputs?• Can they tolerate the disclosure of F(x,y)?

• If so, we can implement the scenario without a trusted party.

6

Secure Function Evaluation [Yao,GMW,BGW]

x yC(x,y) and nothing else nothing

Input:Output:

• F(x,y) – A public function. • Represented as a Boolean circuit C(x,y).

Implementation:• O(|X|) “oblivious transfers”. O(|C|) communication.• Pretty efficient for small circuits! (but what about larger circuits?)

7

An equality circuit

AND

=x1 y1

=x2 y2

=xn yn

=x y

1 if x=y0 otherwise

8

Cryptographic methods vs. randomization methods

inaccuracy

overhead

lack ofprivacy

Randomization methods[statistical disclosure, AS]

Cryptographic methods

Our goal…

9

Examples of Simple Privacy Preserving Primitives (with

reasonable solutions)

• Is X = Y? Is X > Y?• What is X Y? What is median of X Y?• Auctions (negotiations). Many parties,

private bids. Compute the winning bidder and

the sale price, but nothing else. [NPS]• Voting• Add privacy to data mining algs (ID3 – [LP])

Private Set Intersection

withMike Freedman, NYU Kobbi Nissim, MSR

11

Applications of Set Intersection

Governmentagency A

Governmentagency B

People on welfare Expensive car buyers

Compute intersectionand nothing else

12

Computing the Intersection

• Private Equality Test (PET)– Alice: x. Bob: y. – Output: 1 iff x=y – Privacy preserving solutions:

• Cannot use hash functions alone• Yao, [FNW], [NP]

• Generalization: list intersection– X = x1, …, xn Y = y1, …, yn

13

The basic tool: Homomorphic Encryption

• Semantically secure public key encryption• Given Enc(M1), ENC(M2), can compute

(without knowing the decryption key)– Enc(M1+M2)– Enc(c· M1) for any constant c.

– I.e. Enc(a0)+Enc(a1)x+…+Enc(an)xn = Enc(P(x))

• Examples: El Gamal, Paillier, DJ.

14

The Scenario

• Client: X = x1, …, xn

• Server: Y = y1, …, yn

• Output: – Client learns X Y. – Server learns nothing.

15

The Protocol

• Client defines a polynomial of degree n whose roots are x1,…,xn

– P(y) = (x1-y)·(x2-y)·…·(xn-y)

= anyn + … + a1y + a0

• Sends to server homomorphic encryptions of coefficients– Enc(an),…, Enc(a0)

• (only the client can decrypt)

16

…The Protocol

• Server uses homomorphic properties to computey Enc( r·P(y) + y) (r is random)

• If yXY result is Enc(r·0+y)=Enc(y), otherwise result is Enc(random).

• Server sends (permuted) results to C.• C decrypts, compares to its list.

17

Security

• Bad server? The server only sees semantically secure encryptions. Learning about C’s input = breaking enc.

• Bad client? The client can, given only the output XY, simulate her “view” in the protocol. (I.e. she generates encryptions of items in XY, and of random items.)

18

Efficiency

• Client encrypts and decrypts n values

• Communication is O(n)• Server:

– For each input computes Enc(r·P(y)+y), i.e. n exponentiations.

– Total O(n2) exponentiations– Can use hashing to reduce overhead

to O(n lnln n).

19

Is Approximation easier?• Can we approximate size of intersection

(i.e. scalar product) with sublinear overhead?

• Lower bound: – Approximating |XY| within 1 ε factor

requires Ω(n) communication (constant ε).– True even for randomized algorithms.– Proof: reduction to Razborov’s lower bound for

Disjointness.

• Upper bound: protocols with matching overhead.

Secure Computation of the Kth-ranked element

withGagan Aggarwal, StanfordNina Mishra, HPL

21

Secure Computation of the Kth-ranked element

• Inputs:– A: SA B: SB

– Large sets of unique items (D).– There’s also the multi-party scenario

• Output: x SA SB

s.t. |y | y<x, ySASB| = k-1

• Median: k = (|SA| + |SB|) / 2

22

Motivation

• Basic statistical analysis of distributed data

• E.g. histogram of salaries in competing business in the same area

• Sometimes the parties might want to hide the size of their inputs

23

Some information is always revealed

• The Kth-ranked element reveals some information

• Suppose SA = x1,…,x1000

– Median of SA SB = x400

• Party A now learns that SB contains at least 200 elements smaller than x400

• But she shouldn’t learn more

24

Results, and previous work

• Previous work: generic constructions – overhead at least linear in k.

• New results:– Two-party: log k secure comparisons of

log D bit numbers.– Multi-party: log D simple computations

with log D bit numbers.

25

RA

An (insecure) two-party median protocol

LASA

SB

mA

RBLB mB

LA lies below the median, RB lies above the median.

New median is same as original median.Recursion Need log n rounds

(suppose each set contains 2i items)

mA < mB

26

Secure two-party median protocol

A finds median of SA, call it mA

B finds median of SB, call it mB

mA < mB

A deletes xєSA

s.t. x < mA.B deletes xєSB

s.t. x > mB.

A deletes xєSA

s.t. x > mA.B deletes xєSB

s.t. x < mB.

YES

NO

Secure comparison(e.g. a small circuit)

27

Proof of security• Simulation: Given the protocol’s output, each party

can simulate the execution of the protocol

SA

medianFirst comparison: mA < mB

Second comparison: mA > mB

28

Arbitrary inputs, arbitrary k

SA

SB

K

+

Now, compute the median of two sets of size k

Size should be a power of 2

2i

-

++

median of new inputs = kth element of original inputs

29

Conclusions• Efficient privacy preserving primitives for

basic tasks• Open problems

– Intersection: approximate matching?– Median: clustering?

• Theory and applications can and should interact– Tools from the theory of cryptography (e.g.

SFE) can be used in applications– Applications can benefit from rigorous analysis

• There’s a lot more to be done…