analysis of searchable encryption

Analysis of Searchable Encryption Schemes

Nagendra Posani and Swarnim Vyas

December 12, 2016

AbstractSearchable Encryption remain to be one of the most widely required functionality of cloud storage. In

this paper, we provide a security analysis of the popular schemes including the study of their implementationand security definitions. We cover Order Preserving Symmetric Encryption, Order Revealing Encryption andPartial Order Preserving Encoding.

1 IntroductionWith the advent of cloud storage and the need of secure storage, Searchable Encryption has become a topic ofgreat interest. There have been multiple attempts to strike balance between the efficient functionality and securitywhich are trade off of each other. With the paper of [ABO09], many avenues for research in the same area openedup. There have been multiple attempts to define a single notion of security for such schemes which can capture themessage security as standard notions like IND-CCA and IND-CPA. IND-DCPA and IND-OCPA are discussed inthis paper and which schemes are secure under this security notion. We also discuss the POPF security notion ofOPE schemes described by [ABO]. We define Order Revealing Encryption [DBZ] and its latest improvement [KL]which is IND-OCPA. We also discuss Partial Order Preserving Encoding [DSRY] which keeps its data encryptedby strong symmetric encryption schemes and learn about the order of the ciphertext dynamically when a rangequery is made by the user. Finally, we provide a basic landscape of the schemes between functionality and security.

2 Schemes and Security Notions

2.1 Randomized EncryptionTo securely save data on the cloud, we can encrypt the data using any good IND-CPA symmetric encryption. Suchencryption schemes are randomized. The search query, on the encrypted data under this scheme, will result intoserver iterate the whole encrypted data to find a match and then return. This is so because the data is randomizedand no information about message is leaked through the cipher text which can be harnessed to expedite theresponse to the search query. Scheme is IND-CPA secure. This scheme is secure but faces with poor efficiency ofthe search functionality. A linear search needs to be done whenever a query comes to the server. This may lead topoor performance and RTT for the query. Thus, as mentioned above the trade off between security and efficiencyof functionality strikes.

2.2 Order Preserving EncryptionIn the scheme mentioned in section 2.1, we use IND-CPA secure encryption. This implies that the cipher text istotally randomized and thus we lose out in efficiency of search functionality. In case to provide better efficiencyand search functionality, usage of deterministic scheme was suggested by [ABO09]- Order Preserving Encryption.In this scheme, the encryption maintains the order of message in the cipher text as well. For example, if m1 < m2

then Encryption(m1)<Encryption(m2). This scheme enhances the search functionality and the performance is asshown in the figure-1 better than the linear search.

Formally, For A,B ⊆ N with | A | ≤| B |, a function f : A→ B is order preserving (aka. strictly-increasing) iffor all i, j ∈ A, f(i) > f(j) if and only if i > j. We say deterministic encryption scheme SE = (K,Enc,Dec) withplaintext and ciphertext-spaces D, R is order preserving if Enc(K,. ) is an order-preserving function from D to Rfor all K(keys) output by Key generation algorithm K ( with elements of D, R interpreted as numbers, encodedas strings). Unless otherwise stated, we assume the plaintext-space is [M ] and the ciphertext-space is [N ] for someN ≥M ∈ N .

2.3 IND - DCPA and IND - OCPAWe know that no deterministic scheme is IND-CPA secure therefore when using a deterministic encryption asmentioned in section-2.2 we need to come up with a new security notion. IND - DCPA (Indistinguishability under

1

(a) Order Preserving Encryption (OPE) (b) OPE revealing the length.

Figure 1: OPE & Security

Distinct Chosen Plaintext Attack) [MB] restricts the adversary to make only distinct queries on either side oforacle. As deterministic scheme leaks plaintext equality without this restriction the scheme would succumb to atrivial attack. Formally, supposing A makes queries (m1

0,m11), ....., (mq

0,mq1) they require that m1

b , ....,mqb are all

distinct for b ∈ 0, 1. IND - OCPA (Indistinguishability under Ordered Chosen Plaintext Attack) is a generalizedform of IND - DCPA. It adds that except the order of the plaintext nothing else is revealed to the adversary bythe ciphertexts. Formally, it also requires that all the queries made by Adversary A also satisfy the condition thatmi

0 < mj0 iff mi

1 < mj1 for all 1 ≤ i, j ≤ q.

2.4 Is OPE IND-Ordered CPA?Unfortunately, OPE is not IND-Ordered CPA as apart from the order of the message it also leaks informationabout the distance between them. Say m2 = m1 + 10 then Encryption(m1) and Encryption(m2) would give theadversary an idea how far are the messages m1 and m2. [ABO09] paper claims that IND-OCPA is unachievable bya practical order-preserving encryption scheme. Precisely speaking an OPE scheme cannot be IND-OCPA unlessits ciphertext-space is extremely large (exponential in the size of the plaintext-space).

2.5 OPE - POPF & Window One WaynessIn particular, [ABO] paper shows that, for a database of randomly distributed plaintexts and appropriate choice ofparameters, ROPF encryption leaks neither the precise value of any plaintext nor the precise distance between anytwo of them. Informally, the POPF notion calls an OPE scheme secure if oracle access to its encryption algorithmis indistinguishable from that to a random order-preserving function (ROPF), i.e., a random element of the set ofall strictly-increasing functions on the same domain and range. This is a rather straight forward adaptation of theclassical notion of pseudorandom function (PRF), which asks that oracle access to a function be indistinguishablefrom that to a truly random function on the same domain and range - to the order-preserving context, and itcaptures some intuition of what should be the "best possible" OPE scheme. However, the POPF definition issomewhat deceiving and confusing in terms of giving an idea of what kind of security it describes. The proof inthe paper [ABO] addresses the central concerns of the ROPF ciphertexts, whether they leak locations of plaintextsor distance between plaintexts. For this, [ABO] proposed several varieties of one-wayness like (r, z)−WindowOne-Wayness and (r, z)−Window Distance One-Wayness. These security notions say that the ROPF is securefor small window of one wayness and insecure for large windows. The paper also gives a lower bound and upperbound for both the scenarios.

2.6 Order Revealing EncryptionA secret-key encryption scheme is order-revealing [DBZ] if there is a public procedure that takes two encryptedplaintexts as input and reports their lexicographic ordering. This procedure, which we call the order-revealingalgorithm, requires no secrets and can be evaluated by anyone. More precisely, an order-revealing scheme is atuple (G,E,D) of algorithms. Algorithm G outputs a pair (sk, comp) where sk is a secret encryption key andcomp(., .) is an efficient deterministic algorithm that takes two ciphertexts as input and outputs either ‘<’ or ‘>’.

The [DBZ] construction of ORE begins with a simple automaton for the comparison function on two inputsthat they represent as a low-width matrix branching program. They encrypt ciphertexts in a way such that giventwo independently- created ciphertexts, anyone can run the comparison branching program to reveal the relative

2

Figure 2: ORE Encryption: New Construction 1

ordering of the corresponding plaintexts. But this scheme suffers from inference attacks and reveals almost all ofthe plaintext information with auxiliary data [FBD], so there are new constructions proposed to overcome theseattacks.

2.7 New Construction in Order Revealing EncryptionNew construction of ORE as stated in the paper [KL] extends small domain ORE with best possible security tolarge domain ORE with partial leakage using domain extension technique inspired by [NCW]

2.7.1 Small Domain ORE

It considers the message space to be small say 1, 2, 3, ...., N and associate each value with a key thus requiringgeneration & usage of N keys in total. These N keys (k1, k2, ...., kn) can be generated from PRF. Now each valuei is encrypted in a way that that all positions ≤ i have value 1 and all positions > i have value 0. Now whenencrypting message i each bit is encrypted by a key. First bit by k1 second by k2 and so on N th bit by kN .Whenever a query is done by the user, to allow comparison the ki has to be given to the server with the query.But this reveals the value i itself. To avoid this revealing of value i instead of encrypting first bit by k1 , secondby k2 and so on, the encryption of first bit is done by kπ(1), second by kπ(2) and so on N th bit by kπ(N), where πis a random permutation. This doesn’t reveal i when querying a comparison with i. As we do not send ki rathersend kπ(i) thus server learns nothing about the value.

2.7.2 Extending Small Domain ORE

The basic idea is to decompose message into smaller blocks and apply small domain ORE to each block. Noweach chunk’s keys are derived from the prefix block. The overall leakage is first block that differs. Practical OREmentioned in the above section-2.6 leaks the first differing bit whereas this scheme leaks the first differing blockbut overall provides better security under the overall landscape.

2.8 Partial Order Preserving EncodingIn [DSRY] described a new OPE scheme called Partial Order Preserving Encoding(POPE). This is an applicationspecific encoding which can provide better performance in big data scenario where there are many insertions andfew range queries. Typical ’strong’ randomized encryption are used here and data is encrypted and stored by suchbest symmetric cipher.

Server stores a partially ordered B-Tree. Whenever an insertion takes place, the cipher text on that messageis calculated and saved without maintaining any order. User maintains a buffer size (l) with it. Whenever arange query is done by the user, the server returns the whole storage if the buffer size is greater than the data atthe server. Else, if the storage is greater than the buffer size of the user, the Server promotes m random itemsand sends to client. Client sorts, stores, and remembers the m items and sends them back to the server. Clientthen partitions the remaining items and after processing, the result to the query is returned. The figures-3 givessequential flow of the scheme.

Post this further insertions are done and whenever a range query is received, the same process is followed buton the partial B-tree. Thus slowing knowing the order of the ciphertext with each query and not otherwise. Theaverage cost per operation is O(1), and the worst-case round complexity per operation is O(1), assuming: 1) ninsertions 2) Reasonable Client side temporary storage - L ∈ Ω(nO(1)) 3) Not too many range queries m ≤ n÷ L

1Images are taken from David Wu’s presentation on ORE in CCS-16 Conference.2Images are taken from Daniel S. Roche’s presentation on POPE in CCS-16 Conference

3

(a) Client makes a range query

(b) Server returns random m values to the client (c) Clients returns the sorted m values back to the server

(d) Client partitions the remaining items by answering to thequeries by the server

(e) After the processing the result to range query is returned

Figure 3: POPE 2

4

Figure 4: Overall Landscape 3

3 ConclusionThe selection of any deterministic scheme should be done with precaution. The notions described and proven inthe discussed papers should be properly understood before using these schemes in your application. Also, theselection is highly dependent on the application and the tasks it has to perform. For Example, a applicationdealing in big data where large number of insertions take place and few queries are done, POPE can be deployed.However, application demanding large number of queries the performance would deteriorate on use of POPE asit would require lot of interaction with the Client and would increase overall round trip time of the results of thequery. OPE leaks some information about underlying data and therefore practitioners should carefully evaluate thesecurity and functionality achieved when using OPE. Also in case of public key cryptography, OPE is susceptibleto brute force attack using binary search.

Therefore, the choice of scheme should be evaluated on the basis of trade off between efficient functionalityand the security it provides. The security of the scheme should be well understood before using. The nature ofthe application for which it is being used should also be considered. There is no single silver bullet scheme solvingall our problems in world of Searchable Encryption. There is need of continuous research and rigorous testing inreal-life type scenarios before deploying these schemes for widespread usage.

In todays scenario, ORE has been implemented by CipherCloud and SkyHigh, has been prototyped by Googleand Microsoft and used in academic projects like CryptDB [FBD].

References[ABO] Nathan Chenette Alexandra Boldyreva and Adam O’Neill. Order-preserving encryption revisited: Im-

proved security analysis and alternative solution. CRYPTO.

[ABO09] Younho Lee Alexandra Boldyreva, Nathan Chenette and Adam O’Neill. Order-preserving symmetricencryption. EUROCRYPT, page 224–241, 2009.

[DBZ] Mariana Raykova Amit Sahai Mark Zhandry Dan Boneh, Kevin Lewi and Joe Zimmerman. Semanti-cally secure order-revealing encryption: Multi-input functional encryption without obfuscation. EURO-CRYPT.

[DSRY] Seung Geol Choi Daniel S. Roche, Daniel Apon and Arkady Yerukhimovich. Pope: Partial order pre-serving encoding. ACM CCS.

[FBD] David Cash F. Betül Durak, Thomas M. DuBuisson. What else is revealed by order-revealing encryption?ACM CCS.

[KL] David J. Wu Kevin Lewi. Order-revealing encryption: New constructions, applications, and lower bounds.ACM CCS.

3Images are taken from Daniel S. Roche’s presentation on POPE in CCS-16 Conference

5

[MB] C. Namprempre M. Bellare, T. Kohno. Authenticated encryption in ssh: provably xing the ssh binarypacket protocol. ACM CCS.

[NCW] Stephen A. Weis Nathan Chenette, Kevin Lewi and David J. Wu. Practical order- revealing encryptionwith limited leakage. FSE.

6

analysis of searchable encryption

Engineering