secure multiparty computation – applications for privacy...

56
Secure Multiparty Computation – Applications for Privacy Preserving Distributed Data Mining Li Xiong CS573 Data Privacy and Security Slides credit: Chris Clifton, Purdue University; Murat Kantarcioglu, UT Dallas

Upload: others

Post on 05-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Multiparty Computation –

Applications for Privacy Preserving

Distributed Data Mining

Li Xiong

CS573 Data Privacy and Security

Slides credit: Chris Clifton, Purdue University; Murat Kantarcioglu, UT Dallas

Page 2: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Distributed Data Mining

• The setting:

– Data is distributed at different sites

– These sites may be third parties (e.g., hospitals, government (e.g., hospitals, government bodies) or may be the individual him or herself

• Privacy and security constraints:

– Privacy of individual data

– Security of data holders

xn

x1

x3

x2

f(x1,x2,…, xn)

Page 3: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Distributed Data Mining

• Government / public agencies. Example:

– The Centers for Disease Control want to identify disease outbreaks

– Insurance companies have data on disease incidents, seriousness, patient background, etc.

– But can/should they release this information?

• Industry Collaborations / Trade Groups. Example:• Industry Collaborations / Trade Groups. Example:

– An industry trade group may want to identify best practices to help members

– But some practices are trade secrets

– How do we provide “commodity” results to all (Manufacturing using chemical supplies from supplier X have high failure rates), while still preserving secrets (manufacturing process Y gives low failure rates)?

Page 4: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Taxonomy

• Data partition– Horizontally partitioned– Vertically partitioned

• Techniques– Data perturbation– Data perturbation

– Secure Multi-party Computation protocols

• Mining applications– Decision tree– Bayes classifier

– …

Page 5: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Horizontally Partitioned Data

• Data can be unioned to create the complete

set key X1…Xd

K1

k2

kn

key X1…Xd

Site 1

key X1…Xd

Site 2

key X1…Xd

Site r…

kn

K1

k2

ki

Ki+1

ki+2

kj

Km+1

km+2

kn

Page 6: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Vertically Partitioned Data

• Data can be joined to create the complete set

key X1…Xi Xi+1…Xj … Xm+1…Xd

key X1…Xi

Site 1

key Xi+1…Xj

Site 2

key Xm+1…Xd

Site r…

Page 7: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Multiparty Computation

• Goal: Compute function when each party has some of the inputs

• Secure– Can be simulated by ideal model - nobody knows

anything but their own input and the resultsanything but their own input and the results

– Formally: ∃ polynomial time S such that {S(x,f(x,y))} ≡ {View(x,y)}

• Semi-Honest model: follow protocol, but remember intermediate exchanges

• Malicious: “cheat” to find something out

Page 8: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Application of SMC to Private Data

Mining

• The aim:

– Compute the data mining algorithm on the data so that nothing but the output is learned

• Is it sufficient?• Is it sufficient?

xn

x1

x3

x2

f(x1,x2,…, xn)

Page 9: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Privacy and Secure Computation

• Privacy ≠≠≠≠ Security

– Secure computation only deals with the process of

computing the function

– It does not ask whether or not the function should be

computedcomputed

• A two-stage process:

– Decide that the function/algorithm should be

computed – an issue of privacy (e.g. differential

privacy)

– Apply secure computation techniques to compute it

securely – security (focus of this class)

Page 10: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Multiparty Computation

• Basic cryptographic tools– Oblivious transfer

– Random shares

– Oblivious circuit evaluation– Oblivious circuit evaluation

• Yao’s Millionaire’s problem (Yao ’86)– Secure computation possible if function can be

represented as a circuit

• Works for multiple parties as well (Goldreich, Micali, and Wigderson ’87)

Page 11: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

But we aren’t done yet

• Circuit evaluation: Build a circuit that represents the computation

– For all possible inputs

– Impossibly large for typical data mining tasks– Impossibly large for typical data mining tasks

• Next step:

– Efficient techniques for specialized tasks and computations

– Tradeoff between security, efficiency, and accuracy

Page 12: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Outline

• Privacy preserving two-party decision tree

mining (Lindell & Pinkas ’00)

• Primitive protocols

– Secure sum– Secure sum

– Secure union (encryption based)

– Secure max (probabilistic random response

based)

• Secure data mining using sub protocolsPrivacy preserving data mining, Lindell, 2000

Tools for Privacy Preserving Data Mining, Clifton, 2002

Privacy-preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, Kantarcioglu, 2004

Page 13: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Decision Tree Construction (Lindell &

Pinkas ’00)

• Two-party horizontal partitioning

– Each site has same schema

– Attribute set known

– Individual entities private– Individual entities private

• Learn a decision tree classifier

– ID3

• Semi-honest model

Page 14: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

A Decision Tree for “buys_computer”

age?

overcast<=30>4031..40

14

student? credit rating?

no yes yes

yes

fairexcellentyesno

Page 15: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

ID3 Algorithm for Decision Tree Induction

� Greedy algorithm - tree is constructed in a top-down recursive divide-and-conquer

manner

� At start, all the training examples are at the root

� A test attribute is selected that “best” separate the data into partitions -

information gain

� Samples are partitioned recursively based on selected attributes

15

� Conditions for stopping partitioning

� All samples for a given node belong to the same class

� There are no remaining attributes for further partitioning – majority voting is

employed for classifying the leaf

� There are no samples left

Page 16: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Privacy Preserving ID3

Step 1: If R is empty, return a leaf-node with the class value with

Input:

� R – the set of attributes

� C – the class attribute

� T – the set of transactions

Step 1: If R is empty, return a leaf-node with the class value with

the most transactions in T

• Set of attributes is public

– Both know if R is empty

• Use secure protocol for majority voting

• Yao’s protocol

– Inputs (|T1(c1)|,…,|T1(cL)|), (|T2(c1)|,…,|T2(cL)|)

– Output i where |T1(ci)|+|T2(ci)| is largest

Page 17: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Privacy Preserving ID3

Step 2: If T consists of transactions which have all the same value

c for the class attribute, return a leaf node with the value c

• Represent having more than one class (in the transaction set),

by a fixed symbol different from ci,

• Force the parties to input either this fixed symbol or ci• Force the parties to input either this fixed symbol or ci

• Check equality to decide if at leaf node for class ci

• Various approaches for equality checking

– Yao’86

– Fagin, Naor ’96

– Naor, Pinkas ‘01

Page 18: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Privacy Preserving ID3

• Step 3:(a) Determine the attribute that best classifies the transactions in T, let it be A

– Each party compute P1 and P2 , i.e., x1 and x2, respectively

– Essentially done by securely computing x*(ln x) where x is the sum of

∑∑∈∈

−=−=)()(

log)(log)()(Slabelv

vv

Slabelv n

n

n

nvPvPSEntropy

– Essentially done by securely computing x*(ln x) where x is the sum of values from the two parties

• Step 3: (b,c) Recursively call ID3δ for the remaining attributes on the transaction sets T(a1),…,T(am) where a1,…, am are the values of the attribute A– Since the results of 3(a) and the attribute values are public, both

parties can individually partition the database and prepare their inputs for the recursive calls

Page 19: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Security proof tools

– Real/ideal model: the real model can be simulated in

the ideal model

• Key idea – Show that whatever can be computed by a party participating in the protocol can be computed based on its input and output onlybased on its input and output only

• ∃ polynomial Tme S such that {S(x,f(x,y))} ≡ {View(x,y)}

Page 20: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Security proof tools

• Composition theorem

– if a protocol is secure in the hybrid model where

the protocol uses a trusted party that computes

the (sub) functionalities, and we replace the calls the (sub) functionalities, and we replace the calls

to the trusted party by calls to secure protocols,

then the resulting protocol is secure

– Prove that component protocols are secure, then prove that the combined protocol is secure

Page 21: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Sub-protocols for PPDDM

• In general, PPDDM protocols depend on few

common sub-protocols.

• Those common sub-protocols could be re-• Those common sub-protocols could be re-

used to implement PPDDM protocols

Page 22: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Privacy preserving data mining toolkit (Clifton ‘02)

• Many different data mining techniques often perform similar computations at various stages (e.g., computing sum, counting the number of items)

• Toolkit • Toolkit

– simple computations – sum, union, intersection …

– assemble them to solve specific mining tasks –association rule mining, bayes classifier, …

• The protocols may not be truly secure but more efficient than traditional SMC methods

Tools for Privacy Preserving Data Mining, Clifton, 2002

Page 23: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Primitive protocols

• Secure functions

– Secure sum

– Secure union

– …– …

Page 24: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Sum

• Leading site: (v1 + R) mod n

• Site l receives:

• Leading site: (V-R) mod n

Page 25: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Sum

Page 26: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure sum - security

• Does not reveal the real number

• Is it secure?� Site can collude!

� Each site can divide the number into shares, and � Each site can divide the number into shares, and

run the algorithm multiple times with permutated

nodes

Page 27: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Union

• Commutative encryption

– For any set of permutated keys and a message M

– For any set of permutated keys and message M1 and M2For any set of permutated keys and message M1 and M2

• Secure union

– Each site encrypt its items and items from other site, remove duplicates, and decrypt

Page 28: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

E1(E2(E3(ABC)))E1(ABC)E1(E2(ABD))

Secure Union

1

ABC

E1(E2(E3(ABC)))

E1(E2(E3(ABD)))

E2(E3(ABC))

E2(E3(ABD))

E3(E1(ABC))E3(E1(E2(ABD)))E2(E3(ABC))E2(E3(E1(ABC))) 2

ABD

3

ABCE3(ABC)E2(ABD)

E3(ABC)

E3(ABD)

ABC

ABD

Page 29: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Union Security

• Does not reveal which item belongs to which site

• Is it secure under the definition of secure multi-party computation?multi-party computation?

� It reveals the number of items that are common

in the sites!

� Revealing innocuous information leakage allows a

more efficient algorithm than a fully secure

algorithm

Page 30: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Random Shares based Secure Union• Phase 1: random item addition

– Multiple rounds with permutated ring

– Each node sends a random share of its item set and a random share of a random

item set

• Phase 2: random item removal

– Each node subtracts its random items set

30

Page 31: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Random Shares based Secure Union -

Analysis

• Item exposure attack

– An adversary makes a claim C on a particular item a node i contributes to the final result (C: vi in xi)

• Set exposure attack

– An adversary makes a claim C on the whole set of – An adversary makes a claim C on the whole set of items a node i contributes to the final union result X (C: xi = ai).

• Change of beliefs (posterior probability and prior probability)

– P(C|IR,X) - P(C|X)

– P(C|IR,X)/P(C|X)

31

Page 32: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Exposure Risk – Set Exposure

• Disclosure decreases with increasing number of generated

random items and increasing number of participating nodes

• Set exposure risk is or close to 0 for probabilistic and crypto

approach

32

Page 33: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Exposure Risk – Risk Exposure

• Item exposure risk decreases with increasing number of

generated random items and participating nodes

• Item exposure risk for probabilistic approach is quite high

33

Page 34: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Cost Comparison

• Commutative protocol and anonymous communication protocol efficient but sensitive to union size

• Probabilistic protocol efficient but sensitive to domain size

• Estimated runtime for the general circuit-based protocol implemented by FairplayMP framework is 15 days, 127 days and 1.4 years for the domain sizes tested

34

Page 35: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Multi-round protocols

• Multi-round probabilistic protocols for

max/min and top k (Xiong ‘07)

Page 36: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Protocol Structure

• Random response

(Warner 1965)

• Multi-round

randomized protocol

Social Survey

D1 D2

Output Input

36

randomized protocol

– Randomized local

computation

– Multi-node

anonymity

• Assumption: semi-

honest model

Private

Data

D1

Private

Data

D3

Private

Data

D2

Private

Data

Dn

OutputInput

Input

InputOutput

Output

Local

Computation

Local

Computation

Local

Computation

Local

Computation

Preserving Privacy for outsourcing aggregation services, Xiong, 2007

Page 37: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

A Naïve Max/Min Protocol

igi-1 gi

vi 1 2

30 10

30

3040

start

37

gi-1>=vi gi-1<vi

gi gi-1 vi34

20 40

30

40

40

� Add in randomization – how, when, and how

much?

Page 38: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

• Random response at node i:

Max Protocol – Random response

gi-1>=vi gi-1(r)<vi

gi(r) gi-1(r) w/ prob Pr:

random numberigi-1 gi

v

38

w/ prob 1-Pr:

vi

vi

Page 39: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

• Multiple rounds

• Randomization Probability at round r :

– Pr(r) =

• Local algorithm at round r and node i:

Max Protocol – multi-round random

response

1

0*−rdP

• Local algorithm at round r and node i:

39

gi-1(r)>=vi gi-1(r)<vi

gi(r) gi-1(r) w/ prob Pr:

rand [gi-1(r), vi)

w/ prob 1-Pr:

vi

igi-1(r) gi(r)

vi

Page 40: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Max Protocol - Illustration

Start18 3532

D2D2

30 10

0

40

32 4035

D3D4

30

20 40

10

18 3532

32 4035

Page 41: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

PrivateTopK Protocol

Gi’(r)=topk(Gi-1(r) U Vi);

Vi’ = Gi’(r) – Gi-1’(r);

m = |Vi’|;

if m=0 then

Gi(r)= Gi-1(r);

else60

80

100

125

145

110

130

150

2/28/2012 41

else

with probability 1-Pr(r):

Gi(r)= Gi-1(r)

with probability Pr:

Gi(r)[1:k-m] = Gi-1(r)[1:k-m]

Gi(r)[k-m+1:k] = a sorted list of m

random values generated from

[min(Gi’(r)[k]-delta,Gi-1(r-1)[k-m+1]),

Gi’(r)[k])

end

Gi(r)

Gi-1(r)

10

Vi

40

50

80

110

D1

D2

Dn

Page 42: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Min/Max Protocol - Correctness

• Precision bound:

– Converges with r

– Smaller p0 and d provides faster convergence

2

)1(

01*1)Pr(1

=−≥−∏

rrrr

jdPj

42

Page 43: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Min/Max Protocol - Cost

• Communication cost

– single round: O(n)

– Minimum # of rounds given

precision guarantee (1-e):

43

Page 44: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Min/Max Protocol - Security

• Probability/confidence based metric: P(C|IR,R)

– Different types of exposures based on claim

• Data value: v =a

0.50 1

Absolute Privacy Provable Exposure

44

• Data value: vi=a

• Data ownership: Vi contains a

– Loss of Privacy (LoP) = | P(C|IR,R) – P(C|R) |

• Information entropy based metric:

– Loss of privacy as a measure of randomness of information: H(D|R) -

H(D|IR,R)

Page 45: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Min/Max Protocol – Security (Analysis)

• Upper bound for average expected LoP:

max r 1/2r-1 * (1-P0*dr-1)

• Larger p0 and d provides better privacy

45

Page 46: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Min/Max Protocol – Security (Experiments)

46

• Loss of privacy decreases with increasing number of nodes

• Probabilistic protocol achieves better privacy (close to 0)

• When n is large, anonymous protocol is actually okay!

Page 47: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Union

• Commutative encryption based approach

– Number of rounds: 2 rounds

– Each round: encryption and decryption

• Multi-round random-response approach?• Multi-round random-response approach?

Page 48: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Vector

1

0b1

b2

p1

0

1

p2

1

0

pc

OR OR OR… =

1

1

VG

• Each database has a boolean vector of the data items

• Union vector is a logical OR of all vectors

0bL

0

0

0

Privacy Preserving Indexing of Documents on the Network, Bawa, 2003

Page 49: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Group Vector Protocol

Pex=1/2r, Pin=1-Pex

for(i=1; i<L; i++)

if (Vs[i]=1 and VG’[i]=0)

Processing of VG’ at ps of round r…

0

1

0

v1

0

0

1

v2

0

1

0

vc

0

0

0

vG’

0

0

1

vG’

r=1, Pex=1/2, Pin=1/2

if (Vs[i]=1 and VG’[i]=0)

Set VG’[i]=1 with prob. Pin

if (Vs[i]=0 and VG’[i]=1)

Set VG’[i]=0 with prob. Pex

v1v2

vc

r=2, Pex=1/4, Pin=3/40

0

1

vG’

0

1

1

vG’

0

1

1

vG’

0

0

1

vG’

0

1

1

vG’

p1p2 pc

Page 50: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Open issues

� Tradeoff between accuracy, efficiency, and security

� How to quantify security

� How to design adjustable protocols

� Can we generalize the algorithms for a set of � Can we generalize the algorithms for a set of

operators based on their properties

� Operators: sum, union, max, min …

� Properties: commutative, associative, invertible,

randomizable

Page 51: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Functionalities

• Secure Comparison: Comparing two integers without revealing the integer values.

• Secure Polynomial Evaluation: Party A has polynomial P(x) and Part B has a value b, the polynomial P(x) and Part B has a value b, the goal is to calculate P(b) without revealing P(x) or b

• Secure Set Intersection: Party A has set SA A

and Party B has set and Party B has set SB , the goal is to calculate

without revealing anything else. A BS S∩

Page 52: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Secure Functionalities Used

• Secure Set Union: Party A has set SA A and Party and Party

B has set B has set SB , the goal is to calculate

without revealing anything else.

A BS S∪

• Secure Dot Product: Party A has a vector X

and Party B has a vector Y. The goal is to

calculate X.Y without revealing anything else.

Page 53: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

•Secure Sum

•Secure Comparison

•Association Rule Mining

•Decision Trees

Data Mining on Horizontally

Partitioned DataSpecific Secure Tools

•Secure Union

•Secure Logarithm

•Secure Poly. Evaluation

•EM Clustering

•Naïve Bayes Classifier

Page 54: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

•Secure Comparison

•Secure Set Intersection

•Association Rule Mining

•Decision Trees

Data Mining on Vertically

Partitioned DataSpecific Secure Tools

•Secure Dot Product

•Secure Logarithm

•Secure Poly. Evaluation

•K-means Clustering

•Naïve Bayes Classifier

•Outlier Detection

Page 55: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Summary of SMC Based PPDDM

• Mainly used for distributed data mining.

• Provably secure under some assumptions.

• Efficient/specific cryptographic solutions for many distributed data mining problems are developed.developed.

• Mainly semi-honest assumption(i.e. parties follow the protocols)

Page 56: Secure Multiparty Computation – Applications for Privacy ...lxiong/cs573_s12/share/slides/0228_ppdm… · –if a protocol is secure in the hybrid model where the protocol uses

Ongoing research

• New models that can trade-off better

between efficiency and security

• Game theoretic / incentive issues in PPDM

• Combining anonymization and cryptographic

techniques for PPDM