background on security. definition of security 1.attacker’s knowledge/capability – the attacker...

Background on security

Definition of security

1. Attacker’s knowledge/capability– The attacker observes a set of encrypted values only – Ciphertext-only

attack (COA)• Suitable for most real life applications

– The attacker can generate the encrypted values of any plaintext of his choice – chosen-plaintext attack (CPA)• Baseline for public key cryptosystem. The attacker can use the public key to

generate as many as he wants

2. Attacker’s goal– To derive information about the plaintext, any information is fine –

semantic security• E.g., knowing one’s salary > 50k/month but not exact value may be a security

concern

– (A malicious data mining service provider) To return a wrong answer to the user - integrity

Some facts

• There isn’t really a formal method to prove the security against COA– People prefer provable security

• There is always a brute-force attack w.r.t. CPA– Try all the keys and find the one that matches all

plaintext-ciphertext pairs.– Security under CPA means the attack is a (proven)

hard problem

Views from crypto

• We do not know what the attacker knows– Better prepare for the worst• Require provable semantic security under a strong

attack model (at least CPA)

Semantic security

• Definition: no information about the plaintext (except the size) is leaked to the attacker

• An proven equivalent definition – indistinguishability (IND)– Given two encrypted values, the attacker cannot

distinguish them• Remark: Semantic security under CPA is often

written as IND-CPA

Security game

• IND-XXX can be modeled as a game– The attacker generates two messages m0 and m1

and send them to the key owner– The key owner randomly chooses 1 message and

encrypts it, c = E(mi)– With c, the attacker guesses which plain message c

corresponds to – Secure if Pr(guess correct) <= 0.5 + ε

• Where ε is a negligible value, often in the form of 1/xk

– Note: x is a constant, k is key length

Security vs performance

• In general (but not proven), a more secure scheme is more expensive

• Fact 1: Non-deterministic encryption must be required for semantic security– Deterministic encryption

• E(x1) = E(x2) iff x1 = x2

• One-to-one mapping• Onto function most of the time

– Simple attack• The attacker generates g0 = E(m0), g1 = E(m1)

• If gi = c, answer i• Pr(guess correct) = 100%

1

2

3

a

b

c

d

1

2

3

Security vs performance

• Non-deterministic encryption– One-to-many mapping

• Problem:– Ciphertext is longer– Storage cost and processing

cost are thus higher

1

2

3

a

b

c

d

e

f

g

h

1

2

3

Example

• RSA is a deterministic function– Public key: <e, n>, private key <d, n>– E(x) = xe mod n– D(y) = yd mod n

• RSA is not semantic secure

RSA with padding• When the industry refer to RSA, is it actually RSA with padding

– The padding scheme is optimal asymmetric encryption padding (OAEP)– Proven IND-CCA2 (a high security definition)

• Example of simpler padding– Encryption:

• Input: m• Generate random r• Let c = r xor m• Ciphertext: c||E(r)

– Decryption• y = c||E(r)• Recover r from D(E(r))• Decrypted message: m = c xor r

This padding doubles the size of an encrypted value

Secure database (SDB) problem

DB

Service provider (SP)Data Owner (DO)

Query Query

AnswerAnswer

DBDatabase should be encrypted

Compute query on encrypted data

Return an encrypted answer

(In)-feasibility of IND in SDB problem

• Security game:– The attacker generates two queries q0 and q1 and

send them to the DO– The DO randomly chooses 1 query and executes it

with SP– The (encrypted) result r is observed by the

attacker– With r, the attacker guesses which query r

corresponds to

Attacker’s strategy

• Pick q0 = “SELECT count(*)”

• Pick q1 = “SELECT *”

• If r is just an encrypted value, it is q0

• If r is a table, it is q1

• To prevent the above attack, at least make the query results indistinguishable by its size each query result is at least Ω(n)where n is number of tuples

• Decryption cost by DO is then Ω(n) - not better than computing the query using a linear scan

• Selection processing requires the SP to observe whether an encrypted tuple satisfies the query condition or not

Remark: Fully homormophic encryption with IND-CPA in SDB

Discussion paper: Shiyuan Wang, Divyakant Agrawal, and Amr El Abbadi. Is homomorphic encryption the holy grail for database queries on encrypted data? Technical report, Department of Computer Science, UCSB 2012

Cannot jump to an encrypted address

All operations in terms of circuit can be supported(AND, OR, NOT)All input and output are encrypted

Implication of knowing the result of a branch operation

Unknown process

Jump to a

Jump to b

Plain data:10, 20, 21,

22, 23


29, 40

Knowledge of plaintext from CPA

Implication of knowing the result of a branch operation

Unknown process

Jump to a

Jump to b


22, 23


29, 40

E(c)

Pick a = 50, b = 7

Attacker answer: c = a

Attack:

Re-writing the query may help

• If (x>10) { y = 20;} else { y = 100;}

r = cmp_grt(x, 10) // return 1 if x > 10, 0 otherwise

y = 20 + 80 * r

Cannot solve all problems!

Leakage of knowing branch result in practice

• Assume now we allow the SP to observe the branch (i.e., comparison) results, what kind of information is leaked?– Locality of data

Result of cmp(Y, E(q1))

E(t1)

E(t3)

E(t5)

E(t7)

E(t10)

Result of cmp(Y, E(q2))

E(t1)

E(t3)

E(t9)

E(t13)

Derived knowledge – COA:1. q2 q1

2. q2 t1[Y], t3[Y] q1

3. t5[Y] t1[Y], t3[Y] t9[Y]

So, we just protect the exact values in our scheme.And the use of index may make sense

Another way to prove IND (in SMC)

• Proof by simulation• Background– Each party received several messages from the

other party– Can they use these information to observe

anything about the other party?Alice:

Secret x = 3Bob:

Secret y = 7

Result: x+y = 10

Secure sum

Simulation

• Say Bob is the attacker now• Is there any difference on the messages Bob

received if Alice provides different input?– Indistinguishable

Alice:Secret x = 3

Bob:Secret y = 7

Result: x+y = 10

Secure sum

Secure Sum

Alice:Secret x = 3

Bob:Secret y = 7

Result: x+y = 10

Generate r1 = 70

Public parameter: n=100

Send m1 = r1+x mod n= 73

Send m2 = r2+y+m1 mod n= 30 Generate r2 = 50

Alice:Secret a = 60

Bob:Secret b = 50

Keep r2 as share

Keep m2-r1 as share

Bob’s view

Bob:Secret y = 7

Result: x+y = 10

Public parameter: n=100

Send m1 = r1+x mod n= 73Simulation:For any value of xGenerate r1’ = m1 – x mod nThe message m1 can be generated

Simulation succeeds. This protocol is secure w.r.t. IND.

A not secure example

Key agreement protocol

Public parameters:p, g

Bob

Observed: YA, XB

How to derive XA?

Note: since it must be a specific XA so that YA = gXA

Simulation fails.

Note: This protocol is not for protecting parties’ input from the other party

Relaxed security definition

• Also the approach of our paper• Bounded leakage of protocols– Can be proven by the simulations

• Used a lot by Chris Clifton from Purdue University

Jaideep Vaidya and Chris Clifton, Secure Set Intersection Cardinality with Application to Association Rule Mining, JCS 13(4), 2005.Jaideep Vaidya and Chris Clifton, Privacy-Preserving K-Means Clustering over Vertically Partitioned Data, SIGKDD, 2003.Murat Kantarcioglu and Chris Clifton, Privacy Preserving Data Mining of Association Rules on Horizontally Partitioned Data, TKDE 16(9), 2004.

Proof of relaxed definition

• Attacker’s knowledge– Its own input– Messages in the protocol– Leaked knowledge

• If the above is enough to simulate the execution of the protocol, there is not other information leak

• Then, argue the leaked knowledge is not very harmful

background on security. definition of security 1.attacker’s knowledge/capability – the attacker...

Documents