secure query communication and processing …csi.utdallas.edu/paper_links/dissertation-xiao.pdf ·...

192
SECURE QUERY COMMUNICATION AND PROCESSING PROTOCOLS FOR CRITICAL CLOUD APPLICATIONS by Liangliang Xiao APPROVED BY SUPERVISORY COMMITTEE: ___________________________________________ I-ling Yen, Chair ___________________________________________ Ding-Zhu Du ___________________________________________ Dung T. Huynh ___________________________________________ Murat Kantarcioglu

Upload: others

Post on 28-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

SECURE QUERY COMMUNICATION AND PROCESSING PROTOCOLS FOR

CRITICAL CLOUD APPLICATIONS

by

Liangliang Xiao

APPROVED BY SUPERVISORY COMMITTEE:

___________________________________________

I-ling Yen, Chair

___________________________________________

Ding-Zhu Du

___________________________________________

Dung T. Huynh

___________________________________________

Murat Kantarcioglu

Page 2: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

Copyright 2012

Liangliang Xiao

All Rights Reserved

Page 3: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

Dedicated to my family.

Page 4: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

SECURE QUERY COMMUNICATION AND PROCESSING PROTOCOLS FOR

CRITICAL CLOUD APPLICATIONS

by

LIANGLIANG XIAO, B.S., M.S.

DISSERTATION

Presented to the Faculty of

The University of Texas at Dallas

in Partial Fulfillment

of the Requirements

for the Degree of

DOCTOR OF PHILOSOPHY IN

COMPUTER SCIENCE

THE UNIVERSITY OF TEXAS AT DALLAS

December, 2012

Page 5: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

v

ACKNOWLEDGEMENTS

First and most of all, I would like to thank my advisor, Dr. I-ling Yen, for her guidance, advice,

and valuable assistance and unending support. She is truly the best advisor I have ever met. Her

sharp and clear insights into various aspects made sure that I focused on the most interesting

problems. Also, her painstaking working on helping me write this Dissertation and other

technical papers are greatly appreciated. Thank you, Dr. Yen, for everything you have done for

me. I would also like to show my deepest appreciation to the members of my Dissertation

committee and professors who have advised me: Dr. Dung T Huynh, Dr. Ding-Zhu Du, Dr.

Murat Kantarcioglu, Dr. Bhavani Thuraisingham, Dr. Farokh B. Bastani, and Dr. Vincent Ng.

I want to thank my classmates and friends who in one way or another were of assistance to me. I

appreciate all of them for their understanding, encouragements, and suggestions: Osbert Bastani,

Manghui Tu, Jicheng Fu, Tong Gao, Qingkai Ma, Jian Huang, Wenke Zhang, Yunqi Ye, Yunlin

Dong, Longsheng Xia, Panfeng Xue, Yansheng Zhang, Na Zhao, Wei Zhu, Daichao Lu, and

Guang Zhou.

Last but not the least, my family has been with me for all the decisions that I have made and

always stood by me. I want to thank my father and my mother for their nourishment and

guidance.

My research is partly sponsored by the NSF Net-Centric Software and Systems I/UCRC under

Award No. 0855944), the NSF Fundamental Research Program under Award No. 1128270, and

the Air Force Office of Scientific Research under Award No. FA-9550-08-1-0260.

Page 6: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

vi

August 2012

Page 7: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

vii

SECURE COMPUTATION AND COMMUNICATION PROTOCOLS FOR

CRITICAL CLOUD APPLICATIONS

Publication No. ___________________

Liangliang Xiao, Ph.D.

The University of Texas at Dallas, 2012

ABSTRACT

Supervising Professor: Dr. I-ling Yen

It has been a common practice for companies to outsource their online business logics to Web

hosting service providers for over a decade. Generally, databases as well as the business logics of

a company are hosted by a third party to save the IT management time and cost. The cloud

computing further pushes forward this paradigm. There are many cloud-based data centers which

store a very large amount of data from different sources and support data-centric computations.

Security can be a major concern for such data centers when the data they host are sensitive. A

data center may be attacked and compromised. Also, there exists the potential of insider attacks.

If there is a change in management, such as reorganization or buyout, the potential threat

increases due to the additional exposure to multiple management personnel and the unestablished

policies regarding the handling of critical information in such situations.

The security problems with the outsourced databases can be solved if the critical data are

encrypted. Naturally it leads to the problem of how the data center can perform computations on

Page 8: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

viii

encrypted data. Some general computations in data intensive systems include arithmetic

operations and search (exact match search and range search). Several secure computation

techniques in the literature can help achieve these computations, including homomorphic

encryption (HE), order-preserving encryption (OPE), prefix-preserving encryption (PPE), and

multi-party secure computation. Multi-party secure computation can securely perform addition

and multiplication operations on the shared data but they require O(n2) communication overhead

for each multiplication operation where n is the number of shares and, hence, have a high

communication cost. HE allows the arithmetic computation (addition and multiplication) on the

plaintexts to be directly performed on the ciphertexts. OPE preserves the order of the plaintexts.

Thus, range search queries can be processed directly on the data. PPE requires that the length of

the longest common prefix of two plaintexts is equal to that of the ciphertexts. Thus, prefix-

matching search and range search can be performed directly on the data.

However, there are limitations in the existing works on HE, OPE, and PPE. The current circuit

based HE has very expensive computation time, and the security analysis of OPE and PPE are

not sufficient. Moreover, the existing HE, OPE, and PPE all consider one encryption key. Thus,

it is difficult to apply them to multi-user systems where the users have different access privileges

to the database. In this Dissertation, we overcome some of the limitations of HE/OPE/PPE in

existing works. We construct an efficient (non-circuit based) HE scheme and prove its security,

analyze the security of OPE and PPE schemes, and develop mechanisms for HE, OPE, PPE to

extend them to multi-user systems. The results presented in this Dissertation greatly enhance the

state-of-the-art in secure computations.

Page 9: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

ix

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ………………………………………………………………………v

ABSTRACT …………………………………………………………………………………….vii

LIST OF TABLES ………………………………………………………………………………xi

LIST OF FIGURES ……………………………………………………………………………..xii

CHAPTER 1 INTRODUCTION ………………………………………………………………...1

1.1 Homomorphic Encryption …………………………………………………………4

1.2 Order-Preserving Encryption ……………………………………………………...6

1.3 Prefix-Preserving Encryption ……………………………………………………...8

1.4 Overview and Contributions ………………………………………………………9

1.5 Dissertation Layout ………………………………………………………………12

CHAPTER 2 LITERATURE SURVEY ………………………………………………………..14

2.1 Homomorphic encryption ………………………………………………………..15

2.2 Order-preserving encryption ……………………………………………………..16

2.3 Prefix-preserving encryption ……………………………………………..20

CHAPTER 3 SYSTEM MODLE ………………………………………………………………23

3.1 Single-user and Multiple-user Systems ………………………………………….24

3.2 Database Model ………………………………………………………………….25

3.3 Basic Definitions of the Encryption Algorithms ………………………………...26

3.4 Request and Response Protocols ………………………………………………...29

3.5 Limitations of Database Encryption ……………………………………………..30

3.6 Adversary Model …………………………………………………………………30

CHAPTER 4 HOMOMORPHIC ENCRYPTION ……………………………………………..33

4.1 Preliminaries ……………………………………………………………………..34

4.2 The Homomorphic Encryption Scheme ………………………………38

Page 10: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

x

4.3 Homomorphic Encryption in Multi-user Systems ……………………………….54

4.4 Performance of Our Homomorphic Encryption Scheme ………………………...58

4.5 Summary …………………………………………………………………………61

CHAPTER 5 ORDER-PRESERVING ENCRYPTION ……………………………………….62

5.1 Background ………………………………………………………………………67

5.2 Security of OPE ………………………………………………………………….71

5.3 The limitation of the Ideal OPE Object ………………………………………….75

5.4 Generalized OPE …………………………………………………………………80

5.5 Overview of OPE to Multi-user Systems ………………………………………...96

5.6 The Basic DOPE Protocol for Multi-user Systems ………………………………99

5.7 The OE-DOPE Protocol for Multi-user Systems ……………………………….105

5.8 Performance Study ……………………………………………………………...112

5.9 Summary ………………………………………………………………………..116

CHAPTER 6 PREFIX-PRESERVING ENCRYPTION ………………………………………………………….118

6.1 Ideal PPE Object ………………………………………………………………..120

6.2 Security of PPE …………………………………………………………………122

6.3 PPE for Multi-user Systems …………………………………………………….124

6.4 Performance Study ……………………………………………………………...135

6.5 Summary ………………………………………………………………………..138

CHAPTER 7 SUMMARY AND FUTURE RESEARCH ……………………………………140

APPENDIX …………………………………………………………………………………….145

A.1 Security Proof for OPE …………………………………………………………145

A.2 Security Proof for PPE ………………………………………………………….165

REFERENCES ………………………………………………………………………………...174

VITA

Page 11: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

xi

LIST OF TABLES

Number Page

4.1 The performance of the communication protocol ……………………………………….60

5.1 Performance of Hyper and Poly OPE schemes ………………………………………...113

5.2 Comparisons of the basic-DOPE and OE-DOPE with Hyper OPE scheme …………...114

5.3 Comparisons of the basic-DOPE and OE-DOPE with Poly OPE scheme …………….115

5.4 Performances of the OE-DOPE for different q ………………………………………...116

6.1 Encryption Cost (in milliseconds) Comparisons for Different Protocols ……………...138

Page 12: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

xii

LIST OF FIGURES

Number Page

2.1 The PPE algorithm ………………………………………………………………………21

3.1 Single-user and Multi-user Systems …………………………………………………….25

4.1 Request processing protocol ……………………………………………………………57

4.2 The size of 𝑚 against the speed of addition and multiplication ………………………...59

5.1 DOPE scheme 𝒮𝒠pλ,μ

(𝒦pλ,μ

, 𝒠pλ,μ

, 𝓓pλ,μ

) ……………………………………………….100

5.2 The pseudo code for the basic-DOPE protocol ………………………………………...104

5.3 The structure and message flow of the basic-DOPE protocol …………………………104

5.4 The OE-DOPE protocol ………………………………………………………………..108

5.5 Message Flow of the OE-DOPE protocol ……………………………………………...109

6.1 The DLLCP attack ……………………………………………………………………..122

6.2 The Protocol 𝑃𝒠𝑑 ……………………………………………………………………….129

6.3 The Reduction Algorithm RA …………………………………………………………132

6.4 Computation Cost of Secret Sharing over Zp and G (Share Number m = 6) …………..136

6.3 Encryption Cost Comparisons for Different Protocols ………………………………..137

A.1 Numerically Computed c' = z0/logm Against m ……………………………………….165

Page 13: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

1

CHAPTER 1

INTRODUCTION

It has been a common practice for companies to outsource their online business logics to Web

hosting service providers for over a decade. Generally, databases as well as the business logics of

a company are hosted by a third party to save the IT management time and cost. The cloud

computing further pushes forward this paradigm. There are many cloud based data centers which

store a very large amount of data from different sources and support data-centric computation.

Security can be a major concern for such data centers when the data they host are sensitive. A

data center may be attacked and compromised. Also, there are the potential of insider attacks. If

there is a change in management, such as reorganization or buyout ‎[44], the potential threat

increases due to the additional exposure to multiple management personnel and the unestablished

policies regarding the handling of critical information in such situations.

The security problems with the outsourced databases can be solved if the critical data are

encrypted. Naturally it leads to the problem of how the data center can perform computation on

encrypted data ‎[61]. Some general computations in data intensive systems include arithmetic

operations and search (exact match search and range search). Correspondingly, several secure

computation techniques, such as homomorphic encryption (HE) [14, 16, 27, 31, 47, 50, 59, 62,

65, 67, 71], order-preserving encryption (OPE) [3, 6, 12, 13, 38, 39, 53], and prefix-preserving

encryption (PPE) [4, 48, 78] are promising solutions to this problem.

Page 14: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

2

HE allows the computation (addition and multiplication) on the plaintexts to be directly

performed on the ciphertexts. In other words, 𝒠𝐻𝐸(𝑥 + 𝑦) and 𝒠𝐻𝐸 𝑥 ⋅ 𝑦 can be computed from

𝒠𝐻𝐸(x) and 𝒠𝐻𝐸(y) by publically known algorithms, where 𝒠𝐻𝐸 is the HE algorithm and x and y

are two plaintexts. Hence, most of the arithmetic computations can be directly performed on the

cipertexts without needing to first decrypt them.

OPE preserves the order of the plaintexts, i.e., the encryption algorithm 𝒠𝑂𝑃𝐸 satisfies x <

y 𝒠𝑂𝑃𝐸 (x) < 𝒠𝑂𝑃𝐸 (y) for any plaintexts x and y. Thus range search queries can be processed

directly on the ciphertext. OPE can facilitate exact-match search as well. It is not difficult to

realize exact-match search by using deterministic encryption schemes [8, 9, 11], where the

encryption algorithm 𝒠𝐷𝐸 satisfies x = y 𝒠𝐷𝐸 (x) = 𝒠𝐷𝐸 (y). Thus, the equality test of the

plaintexts x and y can be directly performed on the ciphertexts 𝒠𝐷𝐸(x) and 𝒠𝐷𝐸(y). But without

knowing the order of the data, it is difficult to implement efficient search algorithms unless the

content-addressable memory is used ‎[54]. Thus, it is beneficial to use OPE for exact-match

search queries as well. Though the ordering information regarding the data will be revealed in

OPE, the full plaintext is still irreversible.

PPE requires that the length of the longest common prefix of plaintexts x and y equals to

the longest common prefix of 𝒠𝑃𝑃𝐸 (x) and 𝒠𝑃𝑃𝐸 (y), where 𝒠𝑃𝑃𝐸 is the encryption algorithm.

Thus the prefix-matching operation can be performed directly on the ciphertext. PPE can also

support range search since range search can be transformed into prefix-matching searches ‎[48].

In a dataset, different users may have different access rights to its data. In some cases, the

users in an organization may have the same privilege to access the entire dataset stored at an

Page 15: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

3

external service provider. Thus, all users can be treated as the same single user and we call such

systems the single-user systems. In single-user systems, the master encryption keys can be

distributed to all the users since there is no need to enforce any access control policies. However,

in case different users have different access rights, then a user may not be able to access some

data that are readable or writable by another user. We call such systems the multi-user systems.

In multi-user systems the master encryption keys cannot be distributed to all the users; otherwise

the system will not be able to enforce access control [10, 69]. Also, the server can collude with

any one of the users to compromise the entire dataset. For classical encryption schemes, a

potential solution is to use different encryption keys for different data. But it may not be easy to

design an HE/OPE/PPE scheme to support computation (such as arithmetic computation and

range search) on data encrypted using different keys. Hence, some key management and secure

communication schemes have to be established to protect the master encryption key while

allowing multiple users (with different access rights) to encrypt and decrypt data that can be used

by the data center. In the following three sections, we summarize existing secure computation

techniques and their problems. We also discuss their deficiencies in handling multi-user systems.

In Section ‎1.4, we discuss our efforts in improving the existing secure computation approaches

and summarize our main contributions. Section ‎1.5 gives the layout of the remaining part of this

dissertation.

Page 16: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

4

1.1 Homomorphic Encryption

Homomorphic encryption (HE) [14, 16, 27, 31, 47, 50, 59, 62, 65, 67, 71] can be public-

key based or symmetric-key based. It is a promising solution to allow arithmetic operations to

be performed on the encrypted data. In HE, the encryption algorithm 𝒠 satisfies

𝒠(x + y) = 𝒠(x) ⊞ 𝒠(y) and 𝒠(x ∙‎y) = 𝒠(x) ⊡ 𝒠(y)

for any plaintexts x and y, where ⊞ and ⊡ refer to two (special) operations on two ciphertexts. It

has‎been‎a‎long‎time‎since‎people‎know‎the‎existence‎of‎“partial”‎HEs,‎which‎are‎homomorphic‎

with respect to one arithmetic operation (either addition or multiplication). For instance, Paillier

encryption ‎[55] is‎a‎“partial”‎HE‎such‎that‎the‎encryption‎algorithm‎is‎homomorphic‎with‎respect‎

to addition but not homomorphic with respect to multiplication, and the well known RSA ‎[60] is

another‎ “partial”‎ HE‎ such‎ that‎ the‎ encryption‎ algorithm‎ is‎ homomorphic‎ with‎ respect‎ to‎

multiplication but not homomorphic with respect to addition. However, the problem of

constructing‎HEs‎ (in‎ some‎ literatures‎ they‎ use‎ the‎ term‎ “fully”‎ homomorphic‎ encryption‎ and‎

denoted by FHE) with respect to both addition and multiplication is open for decades. Polly

Cracker ‎[27] is one of the earliest proposed HE algorithms. But unfortunately the security of

Polly Cracker has not been proved. Some approaches weaken the requirement of HE, e.g. the HE

in ‎[14] only allows one multiplication operation and the HE in ‎[62] doubles the size of the

ciphertexts after each operation and, hence, only allowed logarithmic many operations to be

performed on the ciphertexts. Recently, Gentry constructs the first HE ‎[31]. Since then, many

other constructions [16, 65, 67, 71] follows up. We call those constructions the Boolean circuit

based HEs. They are based on different hard problems but have similar design idea. In Boolean

Page 17: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

5

circuit based HE, the inputs are expressed by binary strings and the computation is represented

by a Boolean circuit accordingly. The encryption algorithm adds some noise into the plaintext

such that the decryption algorithm can successfully decrypt it if the noise is below some

boundary. The binary operation in the circuit can be directly performed on the ciphertexts, but

the noise in the ciphertext will accumulate when the computation continues. Thus, a

bootstrapping process is needed to decrease the noise in the intermediate ciphertext to prevent it

from exceeding the boundary. All existing Boolean circuit based HEs have high computation

complexity. Although efforts [16, 67] have been made to decrease the complexity, the

computation time is still too expensive to be applied in practice [32, 67]. Also, existing HE

schemes only consider a single key, which is infeasible for a practical system with multiple

users. These problems are further discussed below.

Computation time: In the existing Boolean circuit based HE schemes, they evaluate

functions by performing the binary operations on the corresponding circuit. Moreover, it requires

using the bootstrapping technique to decrease the noise in the ciphertexts. Because of the two

factors, the existing HE schemes are too expensive to be implemented in real application. For

example, Gentry’s‎homomorphic‎encryption‎scheme‎‎[31] requires more than 900 seconds to add

two 32-bit integers and more than 18 hours to multiply two 32-bit integers (based on the

performance data given in ‎[32]). It is therefore desirable to improve the efficiency of HE

schemes.

Single key problem: Although in HE schemes, the database does not need to know the

decryption key and can perform computation on encrypted data, the decryption key is still

Page 18: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

6

needed by the users in order to decrypt the retrieved data. The existing HE schemes implicitly

assume that the users know the decryption keys. But giving the decryption key to all the users

will prevent the system from achieving proper access control. Also, the server can collude with

any user to compromise the entire database. A potential solution is to use different encryption

and decryption keys for different data. But it may not be easy to design an HE scheme to support

computations on data that are encrypted using different encryption keys.

1.2 Order-Preserving Encryption

Order preserving encryption (OPE) [3, 6, 12, 13, 38, 39, 53] is deterministic symmetric-

key based. It requires that the ciphertexts preserve the order of the plaintexts, i.e.

x < y 𝒠(x) < 𝒠(y)

for any plaintexts x and y, where 𝒠 is the OPE algorithm. Thus, range search can be performed

directly on the encrypted data efficiently using conventional DBMS techniques, such as

establishing the B+ tree on ciphertexts. OPE do not have perfect security since the ciphertexts

leak the ordering information of the plaintexts. But on the other hand, when it is desirable to

have a reasonable performance for range query processing while achieving a reasonable degree

of security protection, the OPE scheme can be used as long as there is a good understanding of

its security risks. Unfortunately, existing security analysis of OPE is not sufficient. Also, similar

to HE, existing OPE schemes only consider a single encryption key and can be impractical in

real systems. In the following, the problems in existing approaches are further elaborated.

Page 19: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

7

Security analysis: The existing security analysis for the OPE schemes is not sufficient.

Most of them either prove the security against the author-defined attacks, or illustrate the

security based on experiments. The authors in ‎[12] initiate the cryptographic study of OPE

schemes. They first define the ideal OPE object where the encryption function is uniformly

randomly selected from all order-preserving functions. Since the ideal OPE object is not

computationally infeasible, they construct the real OPE scheme which is computationally

indistinguishable from the ideal OPE object. Thus, the real OPE scheme achieves the security

"implied" by the ideal OPE object. However, the security of the ideal objects has not yet been

analyzed. If the security of the ideal OPE object is unacceptable, then the proof of

indistinguishability between the real OPE scheme and the ideal OPE object is not very indicative

in security assurance.

Single key problem: OPE and HE have the similar security problem when applied to multi-

user systems. Unlike HE (which can be public key based or symmetric key based), OPE is

symmetric key based only. Thus, the users need to both read data from and write data to the

database. In conventional OPE schemes, they implicitly assume that the users know the master

encryption key. But as discussed in Section ‎1.1, it is not secure to let all users with different

access privileges know the key. Meanwhile it may not be easy to design an OPE to support

comparisons on data if they are encrypted using different keys.

Page 20: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

8

1.3 Prefix-Preserving Encryption

Prefix-preserving encryption (PPE) [4, 48, 78] is a deterministic symmetric-key based

encryption algorithm. The longest common prefix of any two ciphertexts is of the same length as

the longest common prefix of the corresponding plaintexts. More formally, given any plaintexts

x and y,

|LCP(x, y)| = |LCP(𝒠(x), 𝒠(y))|

where 𝒠 is the PPE algorithm and LCP denotes the function that returns the length of the longest

common prefix of the given two data. Since the plaintext has a matching prefix of x if and only if

the corresponding ciphertext has a matching prefix of 𝒠(x), the prefix-matching search can be

realized in logarithmic-time if the ciphertexts are organized in some standard tree data structures.

Besides prefix-matching search, PPE can also support range search on encrypted data because

the range query on [a, b] can be transformed into at most 2log2b −‎ 1‎many‎ prefix-matching

queries ‎[48]. Like OPE, the security of PPE is weakened since some prefix information of

plaintexts is leaked from ciphertexts. Thus, security analysis of PPE becomes crucial. However,

the existing security analysis of PPE is not sufficient. Also, similar to OPE, existing PPE

schemes only consider a single encryption key, which is infeasible for a practical system. In the

following, the problems in existing approaches are further discussed.

Security analysis: The existing security analysis for PPE schemes is not sufficient. Most of

the existing security proofs are either against the author-defined attacks, or based on

experiments. The authors in ‎[4] initiate the cryptographic study of PPE schemes. Analogous to

the security analysis approach for OPE in ‎[12], they first define the ideal PPE object where the

Page 21: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

9

encryption function is uniformly randomly selected from all prefix-preserving functions, and

construct the real PPE scheme which is computationally indistinguishable from the ideal PPE

object. Thus, the real PPE scheme achieves the security "implied" by the ideal PPE object.

However, such approach has the same problem as that of OPE: the security of the ideal PPE

objects has not yet been analyzed and, hence, the security analysis for PPE schemes is not

complete.

Single key problem: It is similar to the single key problem for OPE as discussed in

Section ‎1.2.

1.4 Overview and Contributions

In this Dissertation, we attempt to overcome some of the limitations on computation on

encrypted data (HE/OPE/PPE) in the existing works: We construct an efficient (non-circuit

based) HE scheme and prove its security, analyze the security of OPE and PPE schemes, and

develop mechanisms for HE/OPE/PPE to extend them to multi-user systems. We further

elaborate our contributions in this Dissertation in the following.

The contributions on HE

We construct a non-circuit based encryption scheme that is homomorphic in both

addition and multiplication. We downgrade the security requirement to achieve

efficiency. Although the algorithm is not semantically secure, we have proved that

when facing an adversary with up to 𝑚 ln poly(𝜆) chosen plaintext and ciphertext pairs,

the security of our algorithm is equivalent to the large integer factorization problem.

Page 22: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

10

Here, 𝑚 is any predetermined constant that is polynomial in the security parameter 𝜆.

Note that the security of the commonly used RSA encryption is no harder than the large

integer factorization problem ‎[60]. Thus, our homomorphic encryption scheme can be

used in applications where semantic security is not required and one-wayness security

is sufficient.

We conduct experiments to compare the performance of our algorithm with that of

Gentry's algorithm. When withstand an attack with at least 1000 chosen plaintext and

ciphertext pairs, our algorithm runs addition in only tenth of a millisecond and runs

multiplication in‎ 108‎ milliseconds.‎ In‎ contrast,‎ Gentry’s‎ homomorphic‎ encryption‎

scheme ‎[31] requires more than 900 seconds to add two 32-bit integers and more than

18 hours to multiply two 32-bit integers (based on the performance data given in ‎[32]).

As can be seen, our algorithm has real world applicability, especially when the large-

scale plaintext attack is not an issue.

We consider multi-user systems and propose a protocol based on similarity transform to

allow our symmetric-key based homomorphic encryption scheme to be used in such

systems. In the request protocol, the secret data in the query will be encrypted by the

distinct user key, and then transformed to the same master key by the database server

based on the similarity transformation. And the response protocol is similar to the

request process but in a reverse way.

The contributions on OPE

Page 23: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

11

We prove the one-wayness security of the 𝑖𝑑eal OPE object to complete the security

analysis of the OPE constructed in ‎[12] (A similar result is also given in ‎[13] after our

work was published as a technical report and submitted to conferences). According to

the result, the real OPE schemes which are computationally indistinguishable from the

ideal OPE object (e.g. the real OPE scheme constructed in ‎[12]) also achieve the one-

wayness security.

We show that the ideal OPE object is not the highest possible secure OPE when the

plaintext domain contains two elements. It raises the question of how to construct more

secure OPE for general plaintext domains. We then present two generalized OPE

(GOPE) algorithms that satisfy stronger notions of security than the ideal OPE object.

We develop protocols to support multi-user data-centric systems where any OPE can be

applied to protect the sensitive data that need to be searched in encrypted form. The

digit based OPE (DOPE) protocol is invented to make any OPE become a distributed

OPE. Accordingly the encryption key is distributed to a group of key agents to assure

that they can distributely encrypt the data and no entity knows the key. The oblivious

encryption (OE) protocol based on the oblivious transfer concept is also proposed to

further enhance the security of DOPE protocol. We prove that the OE-DOPE protocol

achieves the one-wayness security if the underlying OPE has the one-wayness security.

Experiments are conducted to show that our protocols have reasonable overheads.

The contributions on PPE

Page 24: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

12

We successfully define a security notion, IND-PCPA, to exactly qualify the security of

PPE. Specifically, we design the DLLCP attack to show that it is necessary to weaken

the security notion from IND-CPA to IND-PCPA for the ideal PPE object. We then

prove that the ideal PPE object is secure under IND-PCPA. Thus the PPE schemes

which are computationally indistinguishable from the ideal PPE object achieve the

highest security level IND-PCPA.

We develop a distributed PPE protocol to support the multi-user systems, making PPE

feasible for practical use. The major invention in this protocol is the distributed PPE

encryption by a group of key agents. We cryptographically prove the security of our

protocol by defining an ideal model for PPE protocols and showing that our PPE

protocol is computationally indistinguishable from the ideal model. Experiments are

conducted to study the performance of the protocol, showing that our protocols have

reasonable overheads.

1.5 Dissertation Layout

The rest of this dissertation is organized as follows. First, a thorough literature survey is

given in Chapter 2. Specifically, it discusses the state-of-the-art technologies and research works

in distributed storage system, HE, OPE, and PPE.

In Chapter 3, we introduce our system model including the single-user and multi-user

systems, database model, various encryption schemes, request and response protocols,

limitations of database encryption, and adversary model.

Page 25: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

13

In Chapter 4, we design a non-circuit based homomorphic encryption scheme and extend it

to multi-user systems. We analyze the security of our algorithm and conduct the experiments to

compare the performance of our algorithm with the existing ones.

In Chapter 5, we study the security of OPE schemes, propose and construct generalized

OPE (GOPE), and extend OPE to multi-user systems. First, we prove that the ideal OPE object

achieves the one-wayness security and, hence, the real OPE schemes which are computationally

indistinguishable from the ideal OPE object also achieve the one-wayness security. Then we

show that the ideal OPE object may not be the most secure OPE and, hence, propose and

construct GOPE to improve the security of OPE. In order to extend OPE to multi-user systems,

we develop digit based OPE (DOPE) which can be based on any OPE, the corresponding basic

DOPE protocol, and further improve the security of DOPE protocol to OE-DOPE protocol by the

techniques including OE (oblivious encryption), vector permutation, and data mutation.

In Chapter 6, we study the security of PPE schemes and extend PPE to multi-user systems.

We first invent the security notion IND-PCPA and prove it qualifies the security of the ideal PPE

object. Thus, the real PPE schemes which are computationally indistinguishable from the ideal

PPE object are also secure under IND-PCPA. Then we revise an existing PPE (secure under

IND-PCPA)‎to‎the‎“distributed”‎version‎so‎that‎it‎can‎be‎extended‎to‎multi-user systems.

In Chapter 7, we conclude the PhD research and discuss some future research directions.

Page 26: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

14

CHAPTER 2

LITERATURE SURVEY

We consider secure computation as the capabilities of performing computation on encrypted or

secret shared data. The literature of secure computation includes multi-party computation [18,

19, 34, 80], homomorphic encryption, order-preserving encryption, and prefix-preserving

encryption.

In multiparty computation, each data is mapped to n shares and distributed to n servers.

The data can be reconstructed from any t (< n) shares, but any t−1‎shares‎reveals‎no‎information‎

about the original data. The servers can execute the multi-party computation protocol on the

shares to achieve addition and/or multiplication of any two data. During both the storage time

and the computation time, the adversary cannot retrieve any information about the data even it

compromises t−1‎servers.‎Generally,‎computation‎on‎secret‎shared‎data‎cannot‎be‎done‎without‎

information exchanges between the servers holding the shares. Thus, communication cost can be

a concern in multi-party computation.

In homomorphic encryption, the data is encrypted and the computation (addition and

multiplication) on any two data can be directly performed on the ciphertexts. For order-

preserving encryption, the comparison operation on any two data can be directly performed on

the ciphertexts. For prefix-preserving encryption, the prefix matching operation on any two data

can be directly performed on the ciphertexts. In this Dissertation we focus on the secure

computation based on homomorphic encryption, order-preserving encryption, and prefix-

Page 27: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

15

preserving encryption. In the following subsections, existing works on homomorphic encryption,

order-preserving encryption, and prefix-preserving encryption are discussed.

2.1 Homomorphic Encryption

Homomorphic encryption (HE) enables the arithmetic computation (addition and

multiplication) on the plaintexts to be directly performed on the ciphertexts. The main works on

HE algorithms are Boolean circuit based, where the plaintext is a single bit. All operations on

various operand types can then be achieved by constructing the corresponding circuits. In ‎[14],

an encryption scheme based on elliptic curve is proposed, which allows computation on the

ciphertexts directly if the computation involves at most one multiplication with any number of

additions. In ‎[62], a homomorphic encryption scheme has been constructed. This scheme doubles

the ciphertext for each binary operation. In ‎[31], Gentry designs a HE scheme based on the

mathematical object ideal lattices and uses the bootstrapping technique to clean the noise in the

ciphertexts. It is semantically secure and the security of the scheme is based on the splitkey

distinguishing problem. However the computational complexity of the scheme is 𝑂 (𝜆6) for

evaluating a gate over two bits, where 𝜆 is the security parameter and

𝑂 𝑔 𝑥 = 𝑂 𝑔 𝑥 log𝑘𝑔 𝑥 for some 𝑘 .‎ Dijk‎ et‎ al.‎ conceptually‎ simplify‎ Gentry’s‎

construction by using a different hard problem, the approximate-GCD problem over

integers ‎[71]. The authors in ‎[67] improve the computational complexity of HE in ‎[31] from

𝑂 𝜆6 to 𝑂 (𝜆3), and the computational complexity of HE in ‎[71] from 𝑂 (𝜆17) to 𝑂 (𝜆7.25). The

authors in ‎[16] further improve the computational complexity of ‎[31] to 𝑂 (𝜆2) . In ‎[65], a

Page 28: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

16

different HE scheme is constructed based on the elementary theory of algebraic number fields.

However, all of these existing HE schemes have high time complexities [32, 67].

Another approach to the constructions of HE is non-circuit based. The idea is to construct

HE algorithms with plaintexts over a finite domain such as finite field where the addition and

multiplication can be performed directly on the ciphertexts. Compared to circuit-based

approaches, this approach can be more efficient since it does not require additional circuit

computation overhead. In ‎[27], the HE algorithm called Polly Cracker is proposed, which

encrypts a plaintext over a field by adding a random polynomial that vanishes under operations

using the technique of Gröbner bases. Following Fellows and Koblitz’s‎work,‎many‎non-circuit

based HE algorithms using Gröbner bases have been proposed ‎[47] ‎[50] ‎[59]; however, they

have all either been broken ‎[17] ‎[26] ‎[28] ‎[42] ‎[68] or lack conclusive security evidence. In ‎[5],

Armknecht et al. construct a symmetric-key homomorphic encryption scheme based on coding

theory, where the plaintext is encrypted to an b-dimensional codeword (vector). It is semantic

secure against b −‎1‎known‎plaintext‎attacks‎if‎the Decisional Synchronized Codeword Problem

(DSCP) is hard. However, the scheme can only support pre-determined (fixed) number of

multiplications.

2.2 Order-Preserving Encryption

Order preserving encryption (OPE) [3, 6, 12, 13, 39, 53] is a very important technique for

database related applications due to its capability of supporting range query processing [4, 15,

41, 48, 64, 66] directly on encrypted data without needing to decrypt them and expose them to

potential attackers who may have compromised the system.

Page 29: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

17

There are various constructions of the OPE scheme. In ‎[6], the proposed OPE algorithm

first generates a sequence of random numbers (r1,‎…,‎rn,‎…)‎and‎then‎encrypts‎an‎integer‎x to the

sum of the first x random numbers (i.e. E(x) = 1≤i≤x ri). In ‎[39], a sequence of strictly increasing

polynomial functions (f1,‎…,fn) are used to construct the OPE algorithm. The encryption of an

integer x is the outcome of the iterative operations of those functions on x (i.e. E(x) =

(f1○…○fn)(x)). In ‎[39], the OPE algorithm is constructed by using a mapping function composed

of a partition and an identification functions. The partition function divides the plaintext domain

into multiple partitions, and the identification function assigns an ordered identifier (integer) to

each partition. Then, the mapping function maps the plaintext x to the identifier of the partition it

belongs to. Since different integers may be mapped to the same identifier, the OPE algorithm

may output false comparison results. In ‎[3], the authors construct the OPE algorithm following

three steps: modeling the input and target distributions, flattening the plaintext database into a

flat database, and transforming the flat database into the cipher database.

OPE do not have perfect security since the ciphertexts can leak the ordering information of

the plaintexts. But on the other hand, when it is desirable to have a reasonable performance for

range query processing while achieving a reasonable degree of security protection, the OPE

scheme can be used as long as there is a good understanding of its security risks. However, how

secure is the OPE scheme has not been sufficiently analyzed and further research is needed to

investigate its security properties. Some partial security analysis has been performed on some

OPE algorithms. In ‎[3], the authors construct an OPE scheme and analyze its security, but the

analysis has some limitations: (1) It assumes that the adversaries can only view ciphertexts.

Page 30: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

18

(2) The analysis is not based on cryptographic analysis, but based on experiments, i.e., they use

Kolmogorov-Smirnov test to show that the distribution of the ciphertexts and the target

distribution cannot be distinguished. The authors in ‎[12] initiate the cryptographic study of OPE

schemes. They first define the security notion IND-OCPA where the adversary can query the

left-or-right encryption oracle with ordered plaintext pairs. An encryption scheme is secure under

IND-OCPA if the advantage of an efficient adversary (probability to distinguish whether the

returned ciphertexts are encrypted from the left or the right plaintexts) is negligible. It shows that

the OPE scheme is susceptible to the big jump attack, and cannot be secure under IND-OCPA

unless its ciphertext-space is exponential in the size of the plaintext-space. Consequently, there is

no efficient OPE scheme that is secure under IND-OCPA for superpolynomial-sized domains.

Then the paper takes an alternative approach: it defines the security notion POPF-CCA and

constructs an OPE scheme that is secure under POPF-CCA. In POPF-CCA,‎ an‎ “ideal”‎ OPE‎

object is defined where the encryption function is uniformly randomly selected from all order-

preserving functions. For plaintext domain [m] = {i |‎1‎≤‎i ≤‎m} and ciphertext range [n] = {j |‎1‎≤‎

j ≤‎n}, for example, 𝑛 = 𝑚2 and 𝑚 = Ω(2𝜆) where λ is security parameter, it is computationally

infeasible to generate the encryption function of the ideal OPE object since it involves to

generate exponentially many (w.r.t. λ) random bits. Thus, the ideal OPE object is used as the

security‎ goal‎ and‎ a‎ “real”‎ OPE‎ scheme‎ is‎ said‎ to‎ be‎ secure‎ under‎ POPF-CCA if it is

computationally indistinguishable from the ideal OPE object. In ‎[12], two real OPE schemes are

constructed, where a plaintext x is mapped to its ciphertext by a “binary-search-like” process in

the ciphertext space (plaintext space) with the searched points being mapped back to the

Page 31: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

19

plaintext (ciphertext) space using the hypergeometric distribution (negative hypergeometric

distribution). More specifically, let the plaintext domain be [mi] and the ciphertext range be [ni]

in step i. For the middle point yi [ni] (xi [mi]), it will be mapped to xi [mi] (yi [mi]) with

the probability 𝑦𝑖

𝑥𝑖 ⋅

𝑛𝑖 − 𝑦𝑖

𝑚𝑖 − 𝑥𝑖

𝑛𝑖

𝑚𝑖

−1

( 𝑦𝑖 − 1𝑥𝑖 − 1

⋅ 𝑛𝑖 − 𝑦𝑖

𝑚𝑖 − 𝑥𝑖

𝑛𝑖

𝑚𝑖

−1

). It has been proved

in ‎[12] that the real OPE scheme is computationally indistinguishable to the ideal OPE object. In

other words, the real OPE scheme is secure under POPF-CCA.

However, while in ‎[12], the authors reduce the security of real OPE scheme to the security

of the ideal OPE object, they do not analyze the security of the ideal OPE object. As an obvious

counter example, the ideal object is not secure when 𝑛 = 𝑚. Indeed, there exists no secure OPE

scheme when 𝑛 = 𝑚 because the encryption algorithm is necessarily the identity function.

In ‎[12], the authors left open the questions of how to measure the security of ideal OPE object. In

[13, 77], it has been shown that the ideal OPE object achieves one-wayness security and, hence,

the real OPE schemes which are computationally indistinguishable from the ideal OPE object

(e.g. the construction in ‎[12]) also achieve one-wayness security. In ‎[13] the authors also

generalize the concept of OPE to EOE (efficient orderable encryption), where the ciphertexts are

allowed to be non-numerical data objects so that a dedicated comparison algorithm is needed to

compare‎ the‎ciphertexts.‎Then‎a‎“committed”‎EOE‎is‎constructed‎with‎ the‎assumption‎ that‎ the‎

database is static and completed known to the user in advance of encryption so that the user can

encrypt the database‎once‎for‎all.‎The‎constructed‎“committed”‎EOE‎is‎proved‎to‎be‎secure‎under‎

IND-OCPA.

Page 32: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

20

2.3 Prefix-Preserving Encryption

Prefix-preserving encryption (PPE) [4, 48, 78] is a special encryption such that the longest

common prefix of any two ciphertexts is of the same length as the longest common prefix of the

corresponding plaintexts. Such property enables PPE to support IP addresses anonymization,

prefix-matching search or even range search on ciphertexts.

PPE was first proposed in ‎[78] for securely processing real-world Internet traffic traces

without disclosing the IP addresses in them. Since the private information regarding the senders

and receivers of packets may be inferred from the trace, it is highly desirable for the traffic trace

owners to anonymize the IP addresses before making them publicly available for research (e.g.,

routing performance analysis, or clustering of end-systems). However, the classical encryption

algorithms (e.g. AES) will destroy the prefix relationships among the IP addresses which are

important information for the research. Hence the authors in ‎[78] construct a PPE to anonymize

the IP addresses. It is constructed bit by bit, where the i-th bit of the ciphertext is constructed by

applying an instantiating function to the previous i −‎1‎bits‎of‎the‎plaintext‎to‎preserve‎the‎prefix‎

consistency. Specifically, let x = x1…xl {0,1}l be the plaintext and y = y1…yl {0,1}

l be the

corresponding ciphertext. Then

yi = xi L(R(x1,‎…,‎xi−1, k))

for‎1‎≤‎i ≤‎l,‎where‎“”‎denotes‎the‎XOR‎(exclusive or) operator, L denotes the least significant

bit operator, R can be any pseudorandom function (L○R is called the instantiating function), and

k is the encryption key.

Page 33: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

21

In ‎[4], the authors designed a PPE to support secure processing of prefix-matching queries

(such as searching area-code starting with 310), where the prefix is generalized to a sequence of

blocks (e.g. 64 bits or 4 UTF-16 characters) instead of a sequence of bits in ‎[78]. The

construction is shown in Figure 2.1. Let m = m[1]…m[l] be the plaintext partitioned into l blocks

and C = C[1]…C[l] be the ciphertext partitioned into l blocks. Each block of the ciphertext is

constructed iteratively from the plaintext by a block cipher E with the keys eK and eK’, and a

hash function H with the key hK.

Figure 2.1. The PPE algorithm.

To search for matching entries with a prefix x, the system encrypts x into 𝒠(x) and use 𝒠(x) as the

prefix to perform prefix matching on the ciphertexts. Since the plaintext has a matching prefix of

x if and only if the corresponding ciphertext has a matching prefix of 𝒠(𝑥), the prefix-matching

computation can be achieved in logarithmic-time if the ciphertexts are organized in some

standard tree data structures.

In ‎[48], the authors suggested that PPE constructed in ‎[78] can also be used to support

range search on encrypted data. For example, to search for all data in the interval [32, 111] =

[00100000, 01101111], the query can be transformed into prefix-matching queries for prefixes

C[0] 0n; m[0] 0

n;

For i =‎1,‎…,‎l do

R m[i−1] || C[i−1];

P[i] H(hK, R) m[i];

C[i] E(eK, P[i]) H(hK, R);

R m[l] || C[l];

P[l+1] H(hK, R) 0n;

C[l+1] E(eK’, P[l+1]) H(hK, R);

Return C[1]… C[l+1];

Page 34: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

22

{001*, 010*, 0110*} where * denotes an arbitrary suffix. Generally, for a given range [a, b], the

range query can be transformed into at most 2log2b −‎1‎prefix-matching queries.

Like OPE, the existing work does not offer sufficient security analysis of the PPE schemes.

Most of the existing security analyses of the PPE schemes are informal: either they prove the

security of the PPE schemes against the author-defined attacks, or they illustrate the security of

the PPE schemes based on experiments. The authors in ‎[4] initiate the cryptographic study of

PPE scheme, where the security notion is defined based on the ideal PPE object (which is

analogous to OPE). The ideal PPE object is a special PPE such that the encryption function is

uniformly randomly selected from all prefix-preserving functions. Although the ideal PPE object

cannot be constructed efficiently, it is used as the security goal. A real PPE scheme is defined to

be‎ “secure”‎ if‎ it‎ is‎ computationally‎ indistinguishable‎ from‎ the‎ ideal‎ PPE‎ object. According to

this security definition, the authors proved that their real PPE construction (in figure 2.1) is

“secure”‎(i.e.‎computationally‎indistinguishable‎from‎the‎ideal‎PPE‎object).‎ In‎fact,‎ the‎authors‎

in ‎[78] have also proved that their PPE scheme (yi = xi L(R(x1,‎…,‎xi−1, k))) is computationally

indistinguishable from the ideal PPE object, except that they did not user the crypto

terminologies. Unfortunately, the current cryptographic security analyses of PPE scheme are not

complete since no existing work analyzes the security of the ideal PPE object.

Page 35: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

23

CHAPTER 3

SYSTEM MODEL

In order to protect the database system against potential attacks, the data stored in the database

need to be encrypted so that the adversary cannot retrieve the data even if the database is

compromised. Various secure computation schemes (HE, OPE, PPE) can be used to better

protect the data so that computation can be performed directly on encrypted data without needing

to decrypt them. Thus, the database server does not need to hold the encryption keys, greatly

enhancing system security. Since data are used differently in SQL queries, they can be encrypted

differently to facilitate different types of computations. In a database system, the types of

computations that may be performed on the data are generally attribute-dependent, i.e., all the

data under a certain attribute (the same column of a table) generally have the same types of

computations on them. Thus we classify the attributes based on the potential computations on

them and determine the encryption scheme for each attribute accordingly. In Section ‎3.2, we

discuss how to classify data attributes and how to select the corresponding encryption schemes.

Then, in Section ‎3.3, we define some general notations for the HE, OPE, and PPE algorithms and

the specific properties they have to satisfy. To facilitate security analysis of the secure

computation schemes (HE, OPE, and PPE), we define the adversary model including the types of

attacks and the adversary structure, in Section ‎3.6.

When data are encrypted by HE, OPE, PPE, etc., the server can perform computation

without needing to know the encryption keys. But the keys are still needed in order to decrypt the

Page 36: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

24

data (in some cases, the users also need the keys to encrypt the data). However, it is not always

possible to let all the users have the encryption keys. We consider two types of user models, the

single-user and the multi-user systems. In single-user systems, all users (come from the same

organization) have the same access privilege to the whole database. Thus, all users can be treated

as the same single user, and the encryption keys can be distributed to all users. In multi-user

systems, different users have different access privileges to different parts of the database and,

hence, the encryption key should not be given to the users. In Section ‎3.1 we discuss the detailed

single-user and multi-user models and how to manage the encryption keys in multi-user systems.

In encrypted database, users can still send query requests to the database server and receive

the corresponding responses. But unlike conventional database, in encrypted database, the (plain)

queries have to be transformed into a suitable (encrypted) form. In Section ‎3.4, we show how to

transform the queries in the encrypted database. After the server receives the encrypted queries,

the queries will be processed directly under the encrypted forms, and the (encrypted) results will

be sent back to the users. However, current secure computation schemes have limitations for

query processing and these limitations are discussed in Section ‎3.5.

3.1 Single-user and Multiple-user Systems

We consider two types of systems based on the types of users, including single-user and

multi-user systems. In a single-user system, all users have the same privilege in accessing the

database, i.e., every user can access the entire database. Thus, the users can share the same key

without security concerns. Also, all the users are treated as the same single user. In multi-user

systems, users have different access privileges to the data stored on the server. Let DB denote the

server and U = {Uj | 1‎≤‎j ≤ u} denote the set of users. Note that single-user systems refer to the

Page 37: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

25

case of u = 1 and multi-user systems refer to the case of u > 1. As discussed in Chapter 1, it has

the single key problem when applying HE/OPE/PPE to multi-user systems. In order to solve the

problem, we assume that in multi-user systems a group of key agents in the set KA = {KAj | 1‎≤‎j

≤ v} are deployed between the users and the DB to manage the encryption keys and mediate the

communication.

Figure 3.1. Single-user and Multi-user Systems.

In a single-user system, the user and DB will authenticate each other before the user can

access the stored data. In multi-user systems, the key agents will validate the access rights of

each user. Additionally, they are responsible to relay the communication between the user and

DB, and transform the data in the messages based on various developed protocols. We assume

that any two entities in the system are connected by a public communication channel (e.g. the

internet), and the communications are protected by conventional techniques such as encryption,

authentication, digital signature, and public-key infrastructure (PKI).

3.2 Database Model

Without loss of generality, we assume that the server hosts a relational database with the

schema R(A1,‎…,‎An) and the set of attributes A = {Ai |‎1‎≤‎i ≤‎n} in both single-user and multi-

user systems. The data in R need to be encrypted to protect their security. Since different

request

response

DB User

Single-user System

Users

request

response response

request

Key agents DB

Multi-user System

Page 38: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

26

attributes may be used differently in SQL queries ‎[40], they should be encrypted by different

algorithms to allow query processing in encrypted form. A typical SQL query has the following

form

SELECT f(Ai)

FROM R

WHERE u < Aj < v AND Ak = w;

where f denotes an aggregation function such as SUM, AVG, etc. As can be seen, the data can be

used in two different ways: (1) arithmetic computation (such as computing the function f on Ai),

(2) range search (such as searching on Aj for data in between u and v) and exact match search

(such as searching on Ak for the data with key w). Some data attributes are simply stored and

retrieved without being operated on. Correspondingly, different encryption schemes are

considered for different types of data attributes: the homomorphic encryption (HE) scheme is

used for arithmetic computation attributes, the order-preserving/prefix-preserving encryption

(OPE/PPE) scheme is used for range search and exact match search attributes, and the

probabilistic encryption (PE) scheme is used for no operation attributes.

3.3 Basic Definitions of the Encryption Algorithms

In the previous section, we have introduced different attributes and the corresponding

encryption schemes required for each type of data attribute. Here, we give the basic definitions

for the encryption schemes, including HE, OPE, PPE, CDE and CPE.

An HE scheme allows the arithmetic operations to be performed directly on encrypted

data. Let (𝒦𝐻𝐸 , 𝒠𝐻𝐸 , 𝒟𝐻𝐸) denote an encryption scheme, where 𝒦𝐻𝐸 is the corresponding key

Page 39: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

27

generation algorithm, and 𝒠𝐻𝐸 and 𝒟𝐻𝐸 are the encryption and decryption algorithms. We

present the formal definition of HE scheme 𝒮𝒠𝐻𝐸 = (𝒦𝐻𝐸 , 𝒠𝐻𝐸 , 𝒟𝐻𝐸 , 𝒡𝐻𝐸) as follows.

Definition 3.2.1: Let 𝒮𝒠𝐻𝐸 = (𝒦𝐻𝐸 , 𝒠𝐻𝐸 , 𝒟𝐻𝐸 , 𝒡𝐻𝐸) be a homomorphic encryption

scheme. 𝒦𝐻𝐸 , 𝒠𝐻𝐸 , and 𝒟𝐻𝐸 form an encryption scheme (𝒦𝐻𝐸 , 𝒠𝐻𝐸 , 𝒟𝐻𝐸) with 𝒦𝐻𝐸 being the

corresponding key generation algorithm, and 𝒠𝐻𝐸 and 𝒟𝐻𝐸 being the encryption and decryption

algorithms such that

𝒟𝐻𝐸 𝒠𝐻𝐸 𝑥, 𝑘 = 𝑥

for any plaintext x and key k. 𝒡𝐻𝐸 is a polynomial time algorithm such that

𝒟𝐻𝐸 𝒡𝐻𝐸 𝒠𝐻𝐸 𝑜1, 𝑘 , … , 𝒠𝐻𝐸 𝑜𝑙 , 𝑘 , 𝑓 𝑥1, … , 𝑥𝑙 , 𝑘 = 𝑓(𝑜1, … , 𝑜𝑙)

for any plaintexts 𝑜1 , … , 𝑜𝑙 and any polynomial function 𝑓 𝑥1, … , 𝑥𝑙 , where 𝒠𝐻𝐸 𝑜𝑖 , 𝑘 are the

original ciphertexts, 1‎≤‎i ≤‎l, and 𝒡𝐻𝐸 𝒠𝐻𝐸 𝑜1, 𝑘 , … , 𝒠𝐻𝐸 𝑜𝑚 , 𝑘 , 𝑓 𝑥1, … , 𝑥𝑙 is the computed

ciphertext.

An OPE scheme preserves the order of the plaintexts so that it allows comparisons to be

performed directly on encrypted data. We present the formal definition of OPE scheme 𝒮𝒠𝑂𝑃𝐸 =

(𝒦𝑂𝑃𝐸 , 𝒠𝑂𝑃𝐸 , 𝒟𝑂𝑃𝐸) as follows.

Definition 3.2.2 (OPE Scheme ‎[12]): Suppose that 𝒮𝒠𝑂𝑃𝐸 = (𝒦𝑂𝑃𝐸 , 𝒠𝑂𝑃𝐸 , 𝒟𝑂𝑃𝐸) is a

deterministic symmetric-key encryption scheme, where 𝒦𝑂𝑃𝐸 : {0,1}*{0,1}

* is a key

generation algorithm, 𝒠𝑂𝑃𝐸 : [m]{0,1}*[n] is a deterministic symmetric-key encryption

algorithm, and 𝒟𝑂𝑃𝐸 : [n]{0,1}*[m] is a decryption algorithm such that

𝒟𝑂𝑃𝐸 𝒠𝑂𝑃𝐸 𝑥, 𝑘 = 𝑥

Page 40: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

28

for any plaintext x and key k. We say that 𝒮𝒠𝑂𝑃𝐸 is an OPE scheme if 𝒠𝑂𝑃𝐸 satisfies‎the‎“order-

preserving‎property”:

x1 < x2 𝒠𝑂𝑃𝐸(𝑥1, 𝑘) < 𝒠𝑂𝑃𝐸(𝑥2, 𝑘)

for any x1, x2 [m] and key k.

A PPE algorithm has the prefix-preserving property: the longest common prefix of any two

ciphertexts is of the same length as the longest common prefix of the corresponding plaintexts.

Assume that the plaintexts and ciphertexts are in {0,1}l, {0,1}

l denotes the set of binary strings of

length l. Let LCP(x1, x2) denote the longest common prefix function which returns the longest

common prefix of two binary strings x1 and x2, and |LCP(x1, x2)| denote the length of LCP(x1, x2).

Then the PPE scheme 𝒮𝒠𝑃𝑃𝐸 = (𝒦𝑃𝑃𝐸 , 𝒠𝑃𝑃𝐸 , 𝒟𝑃𝑃𝐸 ) can be defined as follows.

Definition 3.2.3 (PPE Scheme ‎[78]): A PPE scheme 𝒮𝒠𝑃𝑃𝐸 = (𝒦𝑃𝑃𝐸 , 𝒠𝑃𝑃𝐸 , 𝒟𝑃𝑃𝐸 ) is a

deterministic symmetric-key encryption scheme, where 𝒦𝑃𝑃𝐸 : {0,1}*{0,1}

* is a key

generation algorithm, 𝒠𝑃𝑃𝐸 : {0,1}l×{0,1}

*{0,1}

l is a deterministic symmetric-key encryption

algorithm, and 𝒟𝑃𝑃𝐸 : {0,1}l×{0,1}

*{0,1}

l is a decryption algorithm such that

𝒟𝑃𝑃𝐸 𝒠𝑃𝑃𝐸 𝑥, 𝑘 = 𝑥

for any plaintext x and key k. The encryption algorithm 𝒠𝑃𝑃𝐸 satisfies‎ the‎ “prefix-preserving”

property:

|LCP(x1,x2)| = |LCP(𝒠𝑃𝑃𝐸 (x1,k), 𝒠𝑃𝑃𝐸 (x2,k))|

for any x1, x2 {0,1}l and key k.

Let 𝒮𝒠𝑃𝐸 = (𝒦𝑃𝐸 , 𝒠𝑃𝐸 , 𝒟𝑃𝐸) denote the probabilistic encryption scheme, where 𝒦𝑃𝐸 is the

key generation algorithm, 𝒠𝑃𝐸 is the probabilistic encryption algorithm, and 𝒟𝑃𝐸 is the

Page 41: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

29

decryption algorithm. 𝒮𝒠𝑃𝐸 are well studied and have efficient constructions in the existing

literatures [35, 37, 46].

To protect the data stored on the DB, the relational database R(A1,‎ …,‎ An) will be

encrypted to 𝑅𝒠(𝐴1𝒠1 ,‎…,‎𝐴𝑛

𝒠𝑛 ). 𝐴𝑖𝒠𝑖 denotes that attribute Ai is encrypted by 𝒠𝑖 which is selected

from 𝒠𝐻𝐸 , 𝒠𝑂𝑃𝐸 , 𝒠𝑃𝑃𝐸 , and 𝒠𝑃𝐸 ,‎1‎≤‎i ≤‎n.

3.4 Request and Response Protocols

In the query request-and-response process, users send SQL queries to DB, the queries are

processed by DB, and the results are sent back to users. In single-user systems, the user holds all

the encryption keys. When the user wants to send a request, e.g. the SQL query in Section ‎3.2 to

DB, he/her will first authenticate himself/herself to DB by any conventional authentication

mechanism. Then the user will encrypt the data in the predicate (in the WHERE clause).

Specifically,‎ the‎ range‎ search‎ condition‎ “u < Aj < v”‎will‎ be‎ encrypted‎ to‎ “𝒠𝑂𝑃𝐸 (u) < 𝐴𝑗𝑂𝑃𝐸 <

𝒠𝑂𝑃𝐸 (v)”‎and‎the‎exact‎match‎search‎condition “Ak = w”‎will‎be‎encrypted‎to‎“𝐴𝑘𝑂𝑃𝐸 = 𝒠𝑂𝑃𝐸 (w)”.‎

The transformed SQL query

SELECT f(𝐴𝑖𝐻𝐸)

FROM 𝑅𝒠

WHERE 𝒠𝑂𝑃𝐸 (u) < 𝐴𝑗𝑂𝑃𝐸 < 𝒠𝑂𝑃𝐸 (v) AND 𝐴𝑘

𝑂𝑃𝐸 = 𝒠𝑂𝑃𝐸 (w);

will be sent to DB. DB will process the query directly on the encrypted data: it first selects all

tuples t satisfying 𝒠𝑂𝑃𝐸 (u) < t(𝐴𝑗𝑂𝑃𝐸 ) < 𝒠𝑂𝑃𝐸 (v) and t(𝐴𝑘

𝑂𝑃𝐸 ) = 𝒠𝑂𝑃𝐸 (w), then performs f on the

data t(𝐴𝑖𝐻𝐸), and sends back the response with the encrypted query result f(t(𝐴𝑖

𝐻𝐸)). Finally, the

user decrypts the encryption to get the query results f(t(Ai)).

Page 42: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

30

In multi-user systems, the classical deterministic/probabilistic encryption keys will be

distributed to the authorized users. Thus, the authorized users can access the data belonging to

the exact match search and no operation attributes, just like the situation of single-user systems.

However, as discussed in Chapter 1, it is not secure to distribute the encryption keys of

HE/OPE/PPE to users. Instead, they will be distributed to the key agents such that the key agents

will serve as the mediators between the users and DB. The details of the key distribution, request

protocol, and response protocol of HE/OPE/PPE for multi-user systems will be introduced in

Chapters 4, 5 and 6.

3.5 Limitations of Database Encryption

In the relational database R(A1,‎…,‎An), it is possible that both the computation and search

operations may be performed on some attribute Ai,‎1‎≤‎i ≤‎n. Then the encryption of Ai becomes

more difficult. The ideal solution is to encrypt Ai by an encryption algorithm that is both

homomorphic and order-preserving. Unfortunately, such encryption can be very hard to achieve.

To support both arithmetic and comparison operations, Ai can be encrypted by both HE and OPE

algorithms. In other words, the encrypted database becomes 𝑅𝒠(..., 𝐴𝑖𝒠𝐻𝐸 , 𝐴𝑖

𝒠𝑂𝑃𝐸 , ...). However,

such solution may not work fully. When a arithmetic operation is performed on data with

attribute Ai, the results (if to be stored back to DB) can only be in 𝐴𝑖𝒠𝐻𝐸 , not 𝐴𝑖

𝒠𝑂𝑃𝐸 .

3.6 Adversary Model

There could be internal or external attackers against the systems. For example, the multi-

user systems’ entities such as users, DB, and key agents, may collude to acquire additional

information that they are not authorized to access. Or an external attacker may eavesdrop on the

Page 43: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

31

communications (this type of attack will not succeed since the communications are protected by

conventional techniques (Section ‎3.1)) or even compromise some system entities to acquire

information. We unify the possible attacking situations to the probabilistic polynomial time

(PPT) adversary 𝒜 who tries to compromise some entities in the system. If some entities are

compromised, 𝒜 may control these entities. Thus, 𝒜 can either follow the protocols (called

passive adversary) or deviate from the protocols (called active adversary). Generally, the active

adversary can be coped with by more complicated mechanisms based on the secure mechanisms

against the passive adversary [35, 36]. We therefore consider passive adversary in this

Dissertation and leave the active adversary to the future work.

For single-user systems, if the adversary 𝒜 compromises the user, then 𝒜 can retrieve the

identity and the encryption keys to access all the data stored on the server DB. Note that such

attack cannot be prevented. Hence, it is suffices to consider the security of single-user systems

under the situation where 𝒜 compromises DB. If it happens, 𝒜 can view all the encrypted data

stored on DB. Thus, the security of single-user systems is equivalent to that of the encryption

algorithms 𝒠𝐻𝐸 , 𝒠𝑂𝑃𝐸 , 𝒠𝑃𝑃𝐸 , and 𝒠𝐶𝐸 . Since the security of the classical encryption algorithm has

been studied in many existing works, we will discuss the security of HE, OPE, and PPE in

Chapters 4, 5, and 6, respectively.

For multi-user systems, we assume that the adversary structure (the collection of all the

sets of entities which 𝒜 may compromise) is

𝑍 = 𝑈𝒜 ∪ 𝐾𝐴𝒜 , 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ 𝐷𝐵 𝑈𝒜 ⊂ 𝑈, 𝐾𝐴𝒜 ⊂ 𝐾𝐴},

where 𝑈𝒜 is the set of compromised users and 𝐾𝐴𝒜 is the set of compromised key agents (note

that 𝑈𝒜 and 𝐾𝐴𝒜 could be empty). If 𝒜 compromises some users, then 𝒜 can retrieve their

Page 44: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

32

identities to access the corresponding data stored on the server DB. Therefore, the security of the

system lies in whether 𝒜 can gain information about the data with authorized users in U−𝑈𝒜.

For the security of 𝑅𝒠(𝐴1𝒠1 ,‎…,‎𝐴𝑛

𝒠𝑛 ), it suffices to consider the security of each 𝐴𝑖𝒠𝑖 ,‎1‎≤‎i ≤‎n. We

will discuss the security of HE, OPE, and PPE for multi-user systems in Chapters 4, 5, and 6,

respectively.

Page 45: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

33

CHAPTER 4

HOMOMORPHIC ENCRYPTION PROTOCOL

Homomorphic encryption (HE) scheme allows the operations on the plaintexts to be directly

performed on the ciphertexts. Consequently, the clients can encrypt their critical data by HE and

outsource the corresponding ciphertexts to the storage servers such that their data can be

processed by the servers without decrypting them. Over the last three decades, the problem of

HE has been studied extensively. The main works on HE algorithms are circuit-based.

Unfortunately, all of the existing HE schemes are circuit-based and have high time complexities.

The non-circuit based HE problem is still an open one. The idea is to construct HE

algorithms with plaintexts over a finite domain such as finite field. Compared to circuit-based

approaches, this approach can be more efficient since it does not require additional circuit

computation and bootstrapping. However, existing non-circuit based HE algorithms have all

either been broken or lack conclusive security evidence.

We construct an efficient non-circuit based encryption scheme that is homomorphic in both

addition and multiplication in this chapter. In Section ‎4.1, we prove a few preliminary lemmas

related to our algorithm and define the concept of the HE scheme. In Section ‎4.2 we construct

the HE scheme, which is based on eigenvalues and eigenvectors of matrices. We prove that when

facing an adversary with up to 𝑚 ln poly(𝜆) chosen plaintext and ciphertext pairs, the security of

our algorithm is equivalent to the large integer factorization problem. Here, 𝑚 is any

predetermined constant that is polynomial in the security parameter 𝜆. Note that the security of

Page 46: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

34

the commonly used RSA encryption is no harder than the large integer factorization

problem ‎[60]. Thus, our HE scheme can be used in applications where semantic security is not

required and one-wayness security is sufficient. In Section ‎4.3 we extend the encryption scheme

to multi-user systems by considering multiple user keys and establishing the corresponding

request/response communication protocol. The data stored on the database server are encrypted

using HE with‎ a‎ “master‎ key”.‎ Different‎ user keys are assigned to different users.

Correspondingly, the server holds a matching key with respect to each user key. When sending a

request to the server, Ci encrypts the secret data in the request using the user key, and sends the

ciphertext with the request to the server. The server transforms the encryption key from the user

key to the “master‎ key”‎by‎ the similarity transform on the ciphertext using the matching key.

Similarly, when the server sends a response to user Ci, the ciphertext is transformed by the

similarity transform using the matching key and sent with the response to Ci. Ci decrypts it with

the user key to obtain d. To avoid collusion of the user and the server to reconstruct the master

key, one or more agents in between Ci and the server using the same key transformation

technique can be introduced to enhance system security. The real world performance of our

algorithm and the communication protocols in multi-user systems is conducted in section ‎4.4.

Finally we summarize this chapter in Section ‎4.5.

4.1 Preliminaries

4.1.1 The Ring 𝒁𝑵

Page 47: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

35

In our homomorphic encryption scheme, all computations are performed in the ring of

integers 𝑍𝑁 . Let 𝜆 denote the security parameter and let poly(𝜆) denote some fixed polynomial

in 𝜆. We construct 𝑁 as follows. First, 2𝑚 prime numbers 𝑝𝑖 and 𝑞𝑖 of size 𝜆/2 bits are chosen,

where 𝑚 ln poly(𝜆) is the number of plaintext attacks which the algorithm can withstand. Then

let 𝑓𝑖 = 𝑝𝑖𝑞𝑖 and 𝑁 = 𝑓𝑖𝑚𝑖=1 . In the following, we show that such an 𝑁 can be constructed in

polynomial time. Specifically, we show that it is possible to find 2𝑚 prime numbers of 𝜆/2 bits

for some given 𝑚 and 𝜆. Note that 𝑚 is required to be polynomial in 𝜆 to ensure that there are

enough primes of length 𝜆/2 bits.

Lemma 4.1.1: Given 𝑚 and 𝜆, where 𝑚 = 𝑂 poly 𝜆 (more precisely, 𝑚 ≪ 2𝜆−4

), it is

possible to obtain 2𝑚 prime numbers of length 𝜆/2 bits in polynomial time.

Proof. By the Prime Number Theorem, there are approximately 𝑥 ln 𝑥 prime numbers

𝑝 ≤ 𝑥. Consider primes of length 𝑘 bits. There are

2𝑘

ln 2𝑘 −2𝑘−1

ln 2𝑘−1 ≈1

ln 2

2𝑘

𝑘−

2𝑘−1

𝑘−1 =

1

ln 2

𝑘−1 2𝑘−𝑘2𝑘−1

𝑘 𝑘−1

=1

ln 2

2𝑘2𝑘−1−2𝑘−𝑘2𝑘−1

𝑘 𝑘−1 =

1

ln 2

𝑘2𝑘−1−2𝑘

𝑘 𝑘−1 =

1

ln 2

𝑘−2

𝑘 𝑘−1 2𝑘−1

such primes. Since we need to find 2𝑚 primes, at any point there are at least 1

ln 2

𝑘−2

𝑘 𝑘−1 2𝑘−1 −

2𝑚 primes left of length 𝑘 bits. Note that there are 2𝑘 − 2𝑘−1 = 2𝑘−1 integers of length 𝑘 bits,

so at any point the probability that a random number chosen is prime is at least

1

ln 2

𝑘−2

𝑘 𝑘−1 2𝑘−1−2𝑚

2𝑘−1=

1

ln 2

𝑘−2

𝑘 𝑘−1 −

2𝑚

2𝑘−1 . For 𝑘 =𝜆

2 this becomes

1

ln 2

𝜆

2−2

𝜆

2 𝜆

2−1

−2𝑚

2𝜆2−1

=1

ln 2

2𝜆−8

𝜆 𝜆−2 −

𝑚

2𝜆−4 . If 𝑚 is

polynomial in 𝜆 (i.e. 𝑚 ≪ 2𝜆−4

), then this probability is nonnegligible. Since it is possible to

Page 48: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

36

check whether a number is prime in polynomial time, each of the primes can be found in

polynomial time. Since 𝑚 is polynomial in 𝜆, the number of primes we must find is polynomial

in 𝜆, so the time complexity of the algorithm is polynomial in 𝜆.

Theoretically, for 𝜆 = 1024, 𝑚 is only bounded by 𝑚 ≪ 21024−4

= 2510 . Therefore, for

practical purposes, 𝑚 can be chosen to be arbitrarily large.

Next, we show that factoring any of the 𝑓𝑖 is infeasible if large integer factorization

(factoring numbers of the form 𝑓 = 𝑝𝑞 for large primes 𝑝 and 𝑞) is infeasible.

Lemma 4.1.2: Suppose that large integer factorization is infeasible. Let 𝑓𝑖 = 𝑝𝑖𝑞𝑖 for large

primes 𝑝𝑖 and 𝑞𝑖 , where 1 ≤ 𝑖 ≤ 𝑚. Then given 𝐹 = 𝑓𝑖 𝑖=1𝑚 , it is infeasible to factor any of the

𝑓𝑖 ∈ 𝐹, i.e. there does not exist a PPT (probabilistic polynomial time) algorithm that can return a

factor of 𝑓𝑖 for any 1 ≤ 𝑖 ≤ 𝑚 with nonnegligible probability.

Proof. Assume that a PPT algorithm 𝐴1 exists that, given a set of large integers 𝐹 =

𝑓𝑖 𝑖=1𝑚 , can randomly factor one of the 𝑓𝑖 with some probability 𝑝′ . Let 𝐴2 be an algorithm to

factor some large integer 𝑓 = 𝑝𝑞 for primes 𝑝 and 𝑞 as follows. First construct the set 𝐹 =

{𝑓1, … , 𝑓𝑚−1, 𝑓𝑚 = 𝑓} where 𝑓𝑖 is random for 1 ≤ 𝑖 ≤ 𝑚 − 1, and then run 𝐴1 on this set 𝐹, and

return the result. Note that 𝐴1 successfully factors one of the 𝑓𝑖 with probability 𝑝′ , and the

probability that 𝑖 = 𝑚 is 1

𝑚. Thus 𝐴2 is successful with nonnegligible probability

𝑝 ′

𝑚. Clearly 𝐴2 is

a PPT algorithm, completing the proof.

Page 49: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

37

4.1.2 Matrices over 𝒁𝑵

In the algorithm, we need to randomly choose an invertible matrix 𝑘 ∈ 𝐺𝐿4 𝑍𝑁 . Lemma

4.1.3 demonstrates that an arbitrary matrix over 𝑍𝑁 is likely to be invertible, so that it is possible

to choose 𝑘 in polynomial time. For convenience, in the next lemma, we define 𝑁 = 𝑝𝑖𝑚𝑖=1 ,

since our results do not depend on the prime factors of 𝑁 coming in pairs.

Lemma 4.1.3: Let 𝑁 = 𝑝𝑖𝑚𝑖=1 , where the 𝑝𝑖 are prime and 𝑚 = 𝑂 poly 𝜆 . A random

matrix 𝑇 ∈ 𝑀4 𝑍𝑁 is invertible with a high probability.

Proof. Note that an 𝑙 × 𝑙 matrix 𝑇 ′ = (1 , … ,𝑙)𝑡 over a field 𝑍𝑝 , where 𝑡 denotes the

transpose of a matrix and 𝑖 ∈ 𝑍𝑝 𝑙, is not invertible if and only if ∃𝑐𝑖 ∈ 𝑍𝑝 and 𝑐𝑖0

≠ 0 for

some 𝑖0, s.t. 𝑐𝑖𝑖𝑙𝑖=1 = 0. The condition is equivalent to 𝑖0

= 𝑐𝑖 ′𝑖𝑖≠𝑖0, where 𝑐𝑖

′ = −𝑐𝑖0𝑐𝑖.

Since 𝑇 ′ has 𝑙2 entries, totally there are 𝑝𝑙2 possibilities for 𝑇 ′ . Since 𝑖 ≠ 𝑖0, there are a total of

𝑝𝑙−1 possibilities for 𝑐𝑖 ′ , and 𝑝𝑙2−𝑙 possibilities for 𝑖 . Hence, given 𝑇′ , the probability that

𝑖0= 𝑐𝑖 ′𝑖𝑖≠𝑖0

is 𝑝 𝑙−1𝑝 𝑙2−𝑙

𝑝 𝑙2 =1

𝑝. Note that 1 ≤ 𝑖0 ≤ 𝑙, so 𝑇′ is not invertible with probability at

most 𝑙

𝑝.

We show that a matrix 𝑇 over the ring 𝑍𝑁 is invertible if and only if it is invertible over

every field 𝑍𝑝𝑖, where 1 ≤ 𝑖 ≤ 𝑚. Note that if 𝑇 is invertible over 𝑍𝑁 , then we can construct

𝑇−1 mod 𝑝𝑖 . Since 𝑇𝑇−1 = 𝐼 (the identity matrix), 𝑇𝑇−1 = 𝐼 mod 𝑝𝑖 , so 𝑇−1 mod 𝑝𝑖 is the

inverse of 𝑇 mod 𝑝𝑖 , so 𝑇 is invertible over each field 𝑍𝑝𝑖. Now assume that 𝑇 is invertible over

each field 𝑝𝑖 with inverse 𝑆𝑖 , and consider the set of linear congruences 𝑆 = 𝑆𝑖 mod 𝑝𝑖 , which

Page 50: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

38

has a unique solution over 𝑍𝑁 by the Chinese Remainder Theorem. Then we have the set of

linear congruences 𝑇𝑆 = 𝑇𝑆𝑖 mod 𝑝𝑖 = 𝐼 mod 𝑝𝑖 . But 𝐼 is clearly a solution to these

congruences, and by the Chinese Remainder Theorem, this solution is unique over 𝑍𝑁 , so 𝑇𝑆 =

𝐼 , so 𝑇 is invertible with inverse 𝑆 . As shown above, this means that 𝑇 ∈ 𝑀4 𝑍𝑁 is not

invertible with negligible probability

1 − 1 −4

𝑝𝑖

𝑚

𝑖=1

≤ 4

𝑝𝑖

𝑚

𝑖=1

≤4𝑚

min1≤𝑖≤𝑚

𝑝𝑖.

This completes the proof.

4.2 The Homomorphic Encryption Scheme

In this section, we present a fully homomorphic encryption scheme (𝒦, 𝒠, 𝒟, 𝒡) , and

prove that the algorithm is secure under the assumption that it is infeasible to factor large

integers of the form 𝑓 = 𝑝𝑞 for large primes 𝑝 and 𝑞. In Subsection ‎4.2.1 we introduce the idea

of the design. In Subsection ‎4.2.2, we discuss the encryption, decryption, and key generation

algorithms (𝒦, 𝒠, 𝒟). Subsection ‎4.2.3 proves the security of (𝒦, 𝒠, 𝒟) and Subsection ‎4.2.4

derives time complexity of the encryption and decryption schemes. Subsection ‎4.2.5 gives the

computation algorithm 𝒡 and proves the correctness of 𝒡. The complexity of 𝒡 is derived in

Subsection ‎4.2.6.

4.2.1 Design Concept

We first discuss the design idea of our homomorphic encryption scheme. We start from the

Rabin’s‎encryption‎algorithm.‎Given‎a‎plaintext‎x, the encryption algorithm is

Page 51: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

39

𝒠 𝑥 = 𝑥2 mod 𝑁

where 𝑁 = 𝑓 = 𝑝𝑞.‎Although‎Rabin’s‎encryption‎algorithm‎is‎homomorphic‎in‎multiplication,‎

it is not homomorphic in addition. Thus, we can try to generalize the ciphertext domain from ZN

to ring of matrices over ZN. In particular, consider the encryption algorithm

𝒠1 𝑥 = 𝑥 00 𝑟

mod 𝑁.

The homomorphic properties in addition and multiplication can be easily verified.

However, since x is the eigenvalue of the eigenvector 𝑣1,0 = 1,0 𝑡 , the adversary can easily

reverse 𝑥 given the ciphertext by solving the linear equation system 𝒠1(𝑥) 𝑣1,0 = 𝑥 𝑣1,0 .

To cope with the problem above, we apply a randomly selected similarity transform 𝑘 to

𝑥 00 𝑟

, which is the ciphertext in 𝒠1 and we call it the pre-transformed cipher from now. Note

that the eigenvector corresponding to 𝑥 is also transformed by k. As a result, the encryption

algorithm becomes

𝒠2(x, k) = 𝑘−1 𝑥 00 𝑟

𝑘 mod 𝑁

where k is a randomly selected 2 × 2 invertible matrix. We give some informal reasoning as to

why such an algorithm should be secure. Note that the similarity transformation transforms the

eigenvector of 𝑥 from 𝑣1,0 to 𝑘−1 𝑣1,0 . Since the adversary does not know the key 𝑘, he/she does

not know the transformed eigenvector, so he/she cannot establish the linear equation system to

obtain the plaintext. Also, although the adversary can derive the characteristic equation

det 𝑧𝐼 – 𝐸2 𝑥, 𝑘 = 0 mod 𝑁 𝑧2 – 𝑥 + 𝑟 𝑧 + 𝑥𝑟 = 0 mod 𝑁,

Page 52: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

40

it is infeasible for the adversary to solve the quadratic equation, because this is equivalent to the

factorization of 𝑁 (the‎security‎of‎Rabin’s‎encryption‎algorithm‎is‎also‎based‎on‎the‎infeasibility‎

of solving quadratic equation in 𝑍𝑁).

Unfortunately, the encryption algorithm 𝒠2 cannot resist the chosen plaintext attack.

Suppose that the adversary gets the plaintext and ciphertext pair

𝑥, 𝒠2 𝑥, 𝑘 = 𝑘−1 𝑥 00 𝑟

𝑘

Then the adversary can establish the equation 𝒠2 𝑥, 𝑘 − 𝑥𝐼 𝑣 = 0 , and derive the

transformed eigenvector because

(𝒠2(𝑥, 𝑘) − 𝑥𝐼) 𝑣 = 0 𝑘−1 𝑥 00 𝑟

− 𝑥𝐼 𝑘𝑣 = 0 0 00 𝑟 − 𝑥

𝑘𝑣 = 0

k𝑣 =𝑣1,0 𝑣 = 𝑘−1 𝑣1,0 .

Consequently, the adversary can solve for 𝑦, if given an additional ciphertext 𝒠 𝑦, 𝑘 =

𝑘−1 𝑦 00 𝑟

𝑘 mod 𝑁, by solving the linear equation system 𝒠(𝑦, 𝑘) 𝑣 = 𝑦 𝑣 .

One remedy is to use different keys to encrypt different plaintexts, but then the

homomorphic properties in addition and multiplication is lost. Also, increasing the number of

primes in N cannot improve security since the attack does not depend on the number of primes in

N.

To improve the encryption algorithm so that it can withstand the chosen plaintext attack,

we associate the eigenvalue 𝑥 with two eigenvectors 𝑣1 and 𝑣2 instead of one. We choose the

same 𝑣1 for all plaintexts so that the homomorphic properties in addition and multiplication are

guaranteed. On the other hand, 𝑣2 cannot be the same for all plaintexts; otherwise, any linear

Page 53: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

41

combination of 𝑣1 and 𝑣2 is an eigenvector of all plaintexts, becoming the same as 𝒠2. Instead,

we randomly choose 𝑣2 following a probability distribution 𝐷 (to be given). Note that since 𝑥 is

associated with two eigenvectors and there are choices in 𝑣2 , we need to work in matrices of a

higher dimension. For example, we can have 𝑣1 = 1,0,0,0 and 𝑣2 can be randomly chosen

between 0,1,0,0 𝑡 and 0,0,1,0 𝑡 following D.

When encrypting a plaintext, we construct the pre-transformed ciphertext 𝐶 with 𝑣1 and 𝑣2

as eigenvectors of 𝐶 corresponding to eigenvalue 𝑥. Then we perform a similarity transformation

with key 𝑘. Let 𝒠3 denote this encryption scheme.

Now, the adversary has to derive the transformed eigenvector 𝑘−1 𝑣1 in order to

compromise the encryption scheme. Suppose that the adversary gets a single plaintext and

ciphertext pair. Then he/she cannot derive 𝑘−1 𝑣1 because any linear combination of 𝑘−1𝑣1 and

𝑘−1 𝑣2 is an eigenvector of 𝐶 with eigenvalue x.

Next consider the case where the adversary gets two pairs of plaintexts and ciphertexts. If

𝑣2 = 𝑣1 , then, same as above, the adversary cannot derive 𝑣1 . If 𝑣2 ≠ 𝑣1 , then, the probability of

success with which the adversary can derive 𝑘−1 𝑣1 after an attack with 𝑚′ chosen plaintexts

follows a probabilistic distributions related to 𝐷. Without loss of generality, let 𝑝𝑚’ denote the

probability for the adversary to derive 𝑣1 after an attack with 𝑚′ chosen plaintext and ciphertext

pairs. To improve the strength of encryption, we want to apply 𝒠3 multiple times. More

precisely, let 𝑁 = 𝑓𝑖𝑚𝑖=1 , where 𝑓𝑖 = 𝑝𝑖𝑞𝑖 . For each 𝑖, we apply 𝒠3 to encrypt the plaintext 𝑥

over 𝑍𝑓𝑖. Then, the ciphertext over 𝑍𝑁 is obtained by applying the Chinese Remainder Theorem

to the individual encryption results over 𝑍𝑓𝑖, for all 𝑖. Let 𝒠4 denote this new encryption scheme.

Page 54: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

42

Now, the adversary has to derive the corresponding eigenvectors over all 𝑍𝑓𝑖 in order to

reverse the plaintext over 𝑍𝑁 . The probability for the adversary to derive all corresponding

eigenvectors over all 𝑍𝑓𝑖 after an attack with 𝑚’ chosen plaintexts decreases to 𝑝𝑚’

𝑚 . Hence, by

carefully selecting parameters 𝐷 and 𝑚 , 𝒠4 can resist the chosen plaintext attack with a

predetermined number 𝑚’ of plaintexts.

The discussion above is to give a high level concept of our design of a non-circuit based

homomorphic encryption scheme. We will formally prove the security of the encryption scheme

by reduction to the large integer factorization problem. In particular we will show that if there

exists a PPT algorithm that can reverse the ciphertext with nonnegligible probability after the

chosen plaintext attack with 𝑚’ plaintexts, then there exists a PPT algorithm to factor 𝑁.

4.2.2 Encryption and Decryption Algorithms

To formally set up the encryption algorithm following the concept for constructing 𝒠4, we

first pick a random 4 × 4 invertible matrix 𝑘 as the key for our encryption scheme. This can be

done efficiently as proved in Lemma 4.1.3. We then construct the diagonal matrix

diag(𝑥, 𝑎, 𝑏, 𝑐), where 𝑎, 𝑏, and 𝑐, the solutions to a sets of linear congruences depending on 𝑥

and a random value 𝑟 ∈ 𝑍𝑁 , are computed using the Chinese Remainder Theorem. The

corresponding ciphertext 𝐶 is the similarity transform of this matrix by 𝑘 ,

𝐶 = 𝑘−1diag(𝑥, 𝑎, 𝑏, 𝑐)𝑘. The encryption algorithm is formally presented as follows.

Given 𝑓𝑖 𝑖=1𝑚 with 𝑁 = 𝑓𝑖

𝑚𝑖=1 , and a plaintext 𝑥 ∈ 𝑍𝑁 , we encrypt 𝑥 into a ciphertext

𝐶 ∈ 𝑀4 𝑍𝑁 as follows:

Page 55: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

43

1. Choose a random value 𝑟 ∈ 𝑍𝑁.

2. Define a set of numbers 𝑎𝑖 , 𝑏𝑖 , and 𝑐𝑖 , for 1 ≤ 𝑖 ≤ 𝑚, as follows. For each 𝑖, exactly

one of 𝑎𝑖 , 𝑏𝑖 and 𝑐𝑖 , is equal to 𝑥. Let 𝑎𝑖 = 𝑥 with probability 1 −1

𝑚+1, 𝑏𝑖 = 𝑥 with

probability 1

2(𝑚+1), and 𝑐𝑖 = 𝑥 with probability

1

2(𝑚+1). Set the other two values equal

to 𝑟. This way, for each 𝑖, one of the values 𝑎 , 𝑏, and 𝑐, equals 𝑥 and the other two

equal 𝑟.

3. By the Chinese Remainder Theorem, let 𝑎 , 𝑏 , 𝑐 , be the solution to the set of

simultaneous congruences 𝑎 = 𝑎𝑖 mod 𝑓𝑖 , 𝑏 = 𝑏𝑖 mod 𝑓𝑖 , and 𝑐 = 𝑐𝑖 mod 𝑓𝑖 , for

1 ≤ 𝑖 ≤ 𝑚.

4. Let 𝐶 = 𝑘−1diag(𝑥, 𝑎, 𝑏, 𝑐, )𝑘.

Given ciphertext 𝐶 and key 𝑘 , the decryption algorithm compute the plaintext 𝑥 =

𝑘𝐶𝑘−1 00.

The correctness of the encryption scheme is proved in Lemma 4.2.1.

Lemma 4.2.1: The encryption scheme (𝒦, 𝒠, 𝒟) is correct.

Proof. Note that 𝑘−1 −1 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐 𝑘 𝑘−1 00 = diag(𝑥, 𝑎, 𝑏, 𝑐)00 = 𝑥.

The following is an example of our encryption and decryption algorithms. Let 𝑚 = 2, and

consider the values 𝑓1 = 3 × 5 = 15 and 𝑓2 = 2 × 7 = 14 , so that 𝑁 = 210 . We randomly

choose key

𝑘 =

17 4491 121

169 12684 85

85 710 85

119 25201 44

with inverse 𝑘−1 =

35 3144 57

29 0113 29

74 15759 27

37 194152 103

.

Page 56: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

44

We encrypt the plaintext 42 ∈ 𝑍210 using our encryption algorithm. First we randomly

choose 𝑟 = 91 ∈ 𝑍210 . Next, we let 𝑎1 = 𝑥 = 42 and 𝑏1 = 𝑐1 = 𝑟 = 91, and let 𝑎2 = 𝑐2 = 𝑟 =

91 and 𝑏2 = 𝑥 = 42. Then we use the Chinese Remainder Theorem to solve for the values 𝑎 , 𝑏,

and 𝑐 where 𝑎 = 42 mod 15 and 𝑎 = 91 mod 14 , and where 𝑏 = 91 mod 15 and 𝑏 =

42 mod 14 , and where 𝑐 = 91 mod 15 and 𝑐 = 91 mod 14 , and get 𝑎 = 147 , 𝑏 = 196 , and

𝑐 = 91. Finally we construct our ciphertext,

𝐶 = 𝑘−1diag 42, 147, 196, 91 𝑘

=

35 3144 57

29 0113 29

74 15759 27

37 194152 103

42 00 147

0 00 0

0 00 0

196 00 91

17 4491 121

169 12684 85

85 710 85

119 25201 44

=

77 9135 84

154 3549 189

175 13335 98

140 11949 175

Then we decrypt our ciphertext and extract the plaintext 𝑥,

𝑥 = 𝑘𝐶𝑘−1 00

=

17 4491 121

169 12684 85

85 710 85

119 25201 44

77 9135 84

154 3549 189

175 13335 98

140 11949 175

35 3144 57

29 0113 29

74 15759 27

37 194152 103

00

=

42 00 147

0 00 0

0 00 0

196 00 91

00

= 42.

Page 57: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

45

4.2.3 Security of the Encryption Scheme

We proceed with the proof of security by reductions. In Lemma 4.2.2 we establish the

existence of matrices 𝑘𝑖 which will be used in the proof of security because, as will be shown in

Lemma 4.2.3, given the ciphertext 𝐶 , an adversary cannot distinguish between 𝒠(𝑥, 𝑘) and

𝒠(𝑦, 𝑘𝑖𝑘). Lemma 4.2.4 demonstrates that this fact implies that, if a PPT algorithm exists to

extract the plaintext 𝑥, with nonnegligible probability, from the ciphertext 𝐶 without the key 𝑘,

then a PPT algorithm exists to factor 𝑓𝑖 ∈ 𝐹, for some arbitrarily chosen 𝐹 = 𝑓𝑖 𝑖=1𝑚 , for some 𝑖,

with nonnegligible probability. Then, by Lemma 4.1.2, there would exist a PPT algorithm for

large integer factorization with nonnegligible probability. Lemmas 4.2.5 and 4.2.6 prove that,

given 𝑚′ ≤ 𝑚 plaintext and ciphertext pairs 𝑥𝑙 , 𝐶𝑙 𝑙=1𝑚 ′

, if a PPT algorithm exists to find the

plaintext 𝑥 given the corresponding ciphertext 𝐶 with nonnegligible probability, then a PPT

algorithm exists to factor 𝑓𝑖 ∈ 𝐹 = 𝑓𝑖 𝑖=1𝑚 ′

, implying that there exists a PPT algorithm for large

integer factorization with nonnegligible probability. Finally, Theorem 4.2.7 proves the same

result with the weaker condition 𝑚′ ≤ 𝑚 ln poly(𝜆) . In other words, if given less than

𝑚 ln poly 𝜆 plaintext ciphertext pairs, then decryption of a ciphertext without the key is at least

as hard as the well known integer factorization problem.

Lemma 4.2.2: For 1 ≤ 𝑖 ≤ 𝑁 , there exists a unique element 𝑘𝑖 ∈ 𝐺𝐿4 𝑍𝑁 so that

𝑘𝑖 =

0 11 0

0 00 0

0 00 0

0 11 0

mod 𝑝𝑖 , 𝑘𝑖 = 𝐼 mod 𝑞𝑖 , and 𝑘𝑖 = 𝐼 mod 𝑓𝑗 for 𝑗 ≠ 𝑖 where 𝐼 is the identity

matrix in 𝐺𝐿4 𝑍𝑁 . Additionally, 𝑘𝑖 = 𝑘𝑖−1.

Page 58: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

46

Proof. The first claim follows from the Chinese Remainder Theorem. The fact that

𝑘𝑖 = 𝑘𝑖−1 follows from the fact that

𝑘𝑖2 =

0 11 0

0 00 0

0 00 0

0 11 0

0 11 0

0 00 0

0 00 0

0 11 0

mod 𝑝𝑖 =

1 00 1

0 00 0

0 00 0

1 00 1

mod 𝑝𝑖 = 𝐼 mod 𝑝𝑖

and the trivial fact that 𝐼 = 𝐼−1.

Lemma 4.2.3: Given plaintext 𝑥, key 𝑘 and random element 𝑟, there exist 𝑦 and 𝑠 such

that 𝒠 𝑥, 𝑘 = 𝒠 𝑦, 𝑘𝑖𝑘 . Additionally, 𝑦 − 𝑥 divides 𝑞𝑖 , and 𝑦 − 𝑥 does not divide 𝑝𝑖 with

probability 1

𝑚+1 1 −

1

𝑝 𝑖 .

Proof. Note that 𝐶 ′ = diag(𝑥, 𝑎, 𝑏, 𝑐) satisfies the

congruences 𝐶 ′ = diag(𝑥, 𝑎𝑖 , 𝑏𝑖 , 𝑐𝑖) mod 𝑓𝑖 , so

𝑘𝑖𝐶′𝑘𝑖

−1 mod 𝑝𝑖 = 𝑘𝑖diag(𝑥, 𝑎𝑖 , 𝑏𝑖 , 𝑐𝑖)𝑘𝑖−1 mod 𝑝𝑖

=

0 11 0

0 00 0

0 00 0

0 11 0

𝑥 00 𝑎𝑖

0 00 0

0 00 0

𝑏𝑖 00 𝑐𝑖

0 11 0

0 00 0

0 00 0

0 11 0

mod 𝑝𝑖

= diag(𝑎𝑖 , 𝑥, 𝑐𝑖 , 𝑏𝑖) mod 𝑝𝑖

Also, 𝑘𝑖𝐶′𝑘𝑖

−1 mod 𝑞𝑖 = 𝐼𝐶′𝐼 mod 𝑞𝑖 = 𝐶′ mod 𝑞𝑖 and similarly 𝑘𝑖𝐶′𝑘𝑖

−1 mod 𝑓𝑗 =

𝐶′ mod 𝑓𝑗 for 𝑗 ≠ 𝑖 . Let 𝐷′ = diag(𝑦, 𝑎′ , 𝑏′ , 𝑐′) . Then the set of congruences

𝐷′ = diag(𝑎𝑖 , 𝑥, 𝑐𝑖 , 𝑏𝑖) mod 𝑝𝑖 , 𝐷′ = 𝐶′ mod 𝑞𝑖 , and 𝐷′ = 𝐶′ mod 𝑓𝑗 has a unique solution by the

Chinese Remainder Theorem. Note that this solution satisfies 𝐷′ = 𝑘𝑖𝐶′𝑘𝑖

−1, so 𝑘𝑖−1𝐷′𝑘𝑖 = 𝐶′

and 𝑘𝑖𝑘 −1𝐷′ 𝑘𝑖𝑘 = 𝑘−1𝑘𝑖−1𝐷′𝑘𝑖𝑘 = 𝑘−1𝐶′𝑘 = 𝒠(𝑥, 𝑘) . But this means that 𝒠 𝑥, 𝑘 =

Page 59: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

47

𝑘𝑖𝑘 −1𝐷′ 𝑘𝑖𝑘 = 𝒠 𝑦, 𝑘𝑖𝑘 , proving the first claim. Note that additionally 𝑦 = 𝑥 mod 𝑞𝑖 so that

𝑦 − 𝑥 divides 𝑞𝑖 , but 𝑦 = 𝑎𝑖 mod 𝑝𝑖 . Note that 𝑎𝑖 = 𝑟 (i.e., 𝑎𝑖 ≠ 𝑥 ) with probability 1

𝑚+1.

Additionally, 𝑟 ≠ 𝑥 mod 𝑝𝑖 with probability 1 −1

𝑝 𝑖 since 𝑟 is chosen uniformly randomly on 𝑍𝑁 ,

and thus with probability 1 −1

𝑝𝑖, 𝑟 − 𝑥 does not divide 𝑝𝑖 . Thus 𝑦 − 𝑥 divides 𝑞𝑖 , and 𝑦 − 𝑥 does

not divide 𝑝𝑖 with probability 1

𝑚+1 1 −

1

𝑝𝑖 , proving the second claim.

Lemma 4.2.4: If a PPT algorithm 𝐴𝑑(𝐶) exists that, given 𝐶 = 𝒠(𝑥, 𝑘), returns 𝑥 with

probability 𝑝 , then there exists a PPT algorithm 𝐴𝑓 to return a factor 𝑓𝑖 for some 𝑖 with

probability 𝑝′ =𝑝

𝑚+1 1 −

1

𝑝𝑖 .

Proof. Let the algorithm 𝐴𝑓 first choose a random plaintext 𝑥 and a random key 𝑘, and

construct ciphertext 𝐶 = 𝒠(𝑥, 𝑘) using the encryption scheme. Then, 𝐴𝑓 runs 𝐴𝑑(𝐶) to obtain

value 𝑜. Then 𝐴𝑓 returns gcd 𝑓𝑖 , 𝑜 − 𝑥 . Note that 𝐴𝑓 is clearly a PPT algorithm assuming that

𝐴𝑑 is a PPT algorithm. We also show that 𝐴𝑓 is correct with some probability 𝑝′ . Note that since

𝒠 𝑥, 𝑘 = 𝒠(𝑦, 𝑘𝑖𝑘), 𝐴𝑑 will also return 𝑦 with probability 𝑝. Note that 𝑞𝑖 = gcd(𝑓𝑖 , 𝑜 − 𝑥) =

gcd(𝑓𝑖 , 𝑦 − 𝑥) with probability 1

𝑚+1 1 −

1

𝑝𝑖 since 𝑦 − 𝑥 divides 𝑞𝑖 but not 𝑝𝑖 with this

probability. Since 𝑓𝑖 has only factors 𝑝𝑖 and 𝑞𝑖 , the GCD of this pair of numbers is thus 𝑞𝑖 . If p is

nonnegligible, then 𝐴𝑓 is thus correct with nonnegligible probability 𝑝′ =𝑝

𝑚+1 1 −

1

𝑝𝑖 . Clearly

𝐴𝑓 is a PPT algorithm if 𝐴𝑑 is a PPT algorithm, completing the proof.

Page 60: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

48

Lemma 4.2.5: Let 𝑚′ be the number of plaintext and ciphertext pairs the adversary has

access to. If for some 𝑚′ there exists an algorithm 𝐴𝑑 𝐶 = 𝒠 𝑥, 𝑘 , 𝑥𝑙 , 𝐶𝑙 = 𝒠 𝑥𝑙 , 𝑘 𝑙=1

𝑚 ′

such that, given 𝑚′ chosen plaintext and ciphertext pairs 𝑥𝑙 , 𝐶𝑙 and a ciphertext 𝐶, returns 𝑥

with some probability 𝑝, then there exists a PPT algorithm 𝐴𝑓 using 𝐴𝑑 as an oracle to factor 𝑓𝑖

for some 𝑖 with probability

𝑝′ = 𝑝 1 −1

𝑝𝑖 1 − 1 −

1

𝑚 + 1 1 −

1

𝑚 + 1

𝑚 ′

𝑚

.

Proof. As before, let the algorithm 𝐴𝑓 first choose random plaintexts 𝑥𝑖 ∈ 𝑍𝑁 for 1 ≤ 𝑖 ≤

𝑚′ , an additional random plaintext 𝑥 ∈ 𝑍𝑁 , and a random key 𝑘 , and construct ciphertexts

𝐶𝑖 = 𝒠(𝑥𝑖 , 𝑘) and 𝐶 = 𝒠(𝑥, 𝑘) using the encryption scheme. Then let 𝐴𝑓 run 𝐴𝑑 𝐶 to obtain 𝑜.

Then let 𝐴𝑓 return gcd 𝑓𝑖 , 𝑜 − 𝑥 as a factor of 𝑓𝑖 for some 𝑖. We find the probability that 𝐴𝑓

succeeds in factoring 𝑓𝑖 for some 𝑖 . Consider the case where, for some 𝑖0 , for all of the

ciphertexts 𝐶𝑙 , 𝑎𝑙 ,𝑖0= 𝑥𝑙 , and where for the ciphertext 𝐶, 𝑎𝑖0

= 𝑟. Here 𝑎𝑙 ,𝑖0 refers to the 𝑎𝑖0

in

the encryption process of 𝐶𝑙 , and 𝑎𝑖0 refers to the 𝑎𝑖0

in the encryption process of 𝐶. If this is the

case, then as seen in the proof of lemma 4.2.4, 𝒠 𝑥𝑙 , 𝑘 = 𝒠 𝑥𝑙 , 𝑘𝑖0𝑘 (since 𝑎𝑙 ,𝑖0

is assumed to

equal 𝑥𝑙 ), so the adversary cannot differentiate 𝑘 from 𝑘𝑖0𝑘. Additionally, as in lemma 4.2.4,

𝒠 𝑥, 𝑘 = 𝒠 𝑦, 𝑘𝑖0𝑘 for some 𝑦 for which 𝑦 − 𝑥 divides 𝑞𝑖0

. Also, 𝑦 = 𝑎𝑖0mod 𝑝𝑖0

=

𝑟 mod 𝑝𝑖0, so 𝑜 − 𝑥 = 𝑦 − 𝑥 does not divide 𝑝𝑖0

with probability 1 −1

𝑝𝑖0

. Since the adversary

cannot differentiate 𝑘 from 𝑘𝑖0𝑘 , if running 𝐴𝑑(𝐶) returns 𝑥 with probability 𝑝 , it must also

return 𝑦 with probability 𝑝 . Then the probability that 𝑜 = 𝑦 is 𝑝 , and if this happens the

Page 61: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

49

probability that 𝑜 − 𝑥 = 𝑦 − 𝑥 divides 𝑞𝑖 and does not divide 𝑝𝑖 is 1 −1

𝑝𝑖0

. Thus if 𝑖0 exists then

𝐴𝑓 succeeds with probability 𝑝 1 −1

𝑝𝑖 .

To find the probability that such an 𝑖0 exists, note that the probability that, for a specific 𝑙

and 𝑖 , 𝑎𝑙 ,𝑖 = 𝑥𝑙 is 1 −1

𝑚+1, so the probability that for all 𝑙 , 𝑎𝑙 ,𝑖 = 𝑥𝑙 is 1 −

1

𝑚+1

𝑚 ′

. The

probability that, additionally, 𝑎𝑖 = 𝑟 is 1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

. Then the probability that this does not

occur for a given 𝑖 is 1 −1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

, so the probability that this does not occur for any 𝑖

is 1 −1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

𝑚

, and finally the probability that this occurs for some 𝑖 is thus

1 − 1 −1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

𝑚

.

Finally we integrate the derivations and get

𝑝′ = 𝑝 1 −1

𝑝𝑖 1 − 1 −

1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

𝑚

.

Lemma 4.2.6: Assuming that the probability to factor an 𝜆 bit integer in polynomial time

is negligible, the encryption scheme is secure for 𝑚′ ≤ 𝑚.

Proof. As seen the equation of 𝑝′ in Lemma 4.2.5, 1 −1

𝑝𝑖 is nonnegligible. If 1 −

1 −1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

𝑚

is further nonnegligible, then 𝑝 is negligible if and only if 𝑝′ is

negligible. Thus, it implies that the encryption scheme is secure. Otherwise, if a PPT adversary

can attack the scheme with nonnegligible success probability 𝑝, then there will exist a PPT

Page 62: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

50

algorithm to factor some integer with nonnegligible success probability 𝑝′ , which contradicts to

Lemma 4.1.2. Therefore, the encryption scheme is secure if 1 − 1 −1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

𝑚

is

nonnegligible.

Note that 𝑑

𝑑𝛼 1 −

1

𝛼

𝛼

= 𝛼 1 −1

𝛼

𝛼−1

𝛼−2 = 1−

1

𝛼 𝛼−1

𝛼> 0 for 𝛼 > 0. Thus the function is

monotonically increasing, so on 𝑍+ it achieves its minimum at 𝛼 = 1, where it takes the value

1 −1

1+1

1+1

=1

4. Additionally, since lim𝛼→∞ 1 −

1

𝛼

𝛼

=1

𝑒 and since 1 −

1

𝛼

𝛼

is

monotonically increasing, 1 −1

𝛼

𝛼

≤1

𝑒. Then we obtain

1 − 1 −1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

𝑚

≥ 1 − 1 −1

𝑚+1 1 −

1

𝑚+1

𝑚+1

𝑚

≥ 1 − 1 −1

4 𝑚+1

𝑚

= 1 − 1 −1

4 𝑚+1

4 𝑚+1

𝑚

4 𝑚 +1

≥ 1 − 1

𝑒

𝑚

4 𝑚 +1 ≥ 1 −

1

2

1

4 1+1

= 1 − 1

2

1

8≥ 1 − .92 = .08

We now prove the security of our homomorphic encryption algorithm.

Theorem 4.2.7: The bound of 𝑚′ in Lemma 4.2.6 can be weakened to 𝑚′ ≤ 𝑚 ln poly(𝜆),

where poly(𝜆) denotes some fixed polynomial in 𝜆.

Page 63: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

51

Proof. As shown in Lemma 4.2.6, the encryption scheme is secure if 1 − 1 −

1𝑚+11−1𝑚+1𝑚′𝑚 is nonnegligible. In other words, we require that

1 − 1 −1

𝑚+1 1 −

1

𝑚+1

𝑚 ′

𝑚

=1

poly 𝜆 for some fixed polynomial in 𝜆. Then

𝑚′ =

ln 𝑚 + 1 1 − (1 −1

poly(𝜆))

1𝑚

ln 1 −1

𝑚 + 1

Before estimating the lower bound of 𝑚′ , we first derive two inequalities. Note that

𝑑

𝑑𝛼 ln 1 − 𝛼 +

𝛼

1−𝛼 =

𝛼

1−𝛼 2 > 0 for 0 < 𝛼 < 1, and ln 1 − 𝛼 +𝛼

1−𝛼 |𝛼=0 = 0.

Therefore ln 1 − 𝛼 > −𝛼

1−𝛼 for 0 < 𝛼 < 1. Also, note that

𝑑

𝑑𝛼 𝛼 − 1 + 𝑒−𝛼 = 1 − 𝑒−𝛼 > 0 for 0 < 𝛼 < 1, and 𝛼 − 1 + 𝑒−𝛼 |𝛼=0 = 0.

Therefore 𝛼 > 1 − 𝑒−𝛼 for 0 < 𝛼 < 1. Thus,

ln 1 −1

poly (𝜆) > −

1

poly (𝜆)

1−1

poly (𝜆)

= −1

poly 𝜆 −1

→ ln 1 −1

poly (𝜆)

1

𝑚=

1

𝑚ln 1 −

1

poly (𝜆) > −

1

𝑚(poly 𝜆 −1)

→ 1 −1

poly 𝜆

1𝑚

> 𝑒−

1𝑚 poly 𝜆 −1

→ 1 − 1 −1

poly 𝜆

1𝑚

< 1 − 𝑒−

1𝑚 poly 𝜆 −1 <

1

𝑚 poly 𝜆 − 1

Page 64: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

52

→ ln 𝑚 + 1 1 − 1 −1

poly 𝜆

1

𝑚 < ln

𝑚+1

𝑚 poly 𝜆 −1 = − ln poly 𝜆 4.1

The last equation in 4.1 is obtained by replacing poly 𝜆 with the polynomial poly 𝜆 +

1 +poly (𝜆)

𝑚. Also, we have

ln 1 −1

𝑚 + 1 > −

1𝑚 + 1

1 −1

𝑚 + 1

= −1

𝑚→

1

ln 1 −1

𝑚 + 1

< −𝑚 4.2

Hence, by multiplying 4.1 and 4.2 , we get

𝑚′ =

ln 𝑚+1 1− 1−1

poly 𝜆

1𝑚

ln 1−1

𝑚 +1

> 𝑚 ln poly 𝜆 .

We have shown that, under an attack with 𝑚 ln poly 𝜆 chosen plaintext and ciphertext

pairs, our encryption scheme reduces to the large integer factorization problem, under an attack

with 𝑚 ln poly 𝜆 chosen plaintext and ciphertext pairs. Note that there is no constraint on the

chosen plaintexts. In particular, the adversary can choose a plaintext multiple times.

4.2.4 Complexity of the Encryption and Decryption Algorithms

We need to choose 2𝑚 primes in the encryption scheme. As shown in Lemma 4.1.1 this

takes polynomial time. Note that the primes can be precomputed. The decryption algorithm

involves only two matrix multiplications, which, as shown later in Section ‎4.2.6, takes

𝑂(𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆) time. The encryption algorithm requires both two matrix

multiplications and also an algorithm to solve the 𝑚 linear congruences that define the values 𝑎,

Page 65: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

53

𝑏, and 𝑐. It takes time 𝑂 𝑚𝜆 to construct the solution to these linear congruences, so the overall

complexity for encryption is also 𝑂 𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆 .

4.2.5 Computation Algorithms

Multiplication and addition of encrypted elements is simply normal matrix multiplication

and addition, respectively.

Lemma 4.2.8: The multiplication and addition algorithms are correct.

Proof. First we show that addition is correct. Note that

𝒠 𝑥, 𝑘 + 𝒠 𝑦, 𝑘 = 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐 𝑘 + 𝑘−1diag 𝑦, 𝑎′ , 𝑏′ , 𝑐′ 𝑘

= 𝑘−1 diag 𝑥, 𝑎, 𝑏, 𝑐 + diag 𝑦, 𝑎′ , 𝑏′ , 𝑐′ 𝑘

= 𝑘−1diag(𝑥 + 𝑦, 𝑎 + 𝑎′ , 𝑏 + 𝑏′ , 𝑐 + 𝑐′)𝑘 = 𝒠(𝑥 + 𝑦, 𝑘)

Next we show that multiplication is correct. Note that

𝒠 𝑥, 𝑘 𝒠 𝑦, 𝑘 = 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐 𝑘𝑘−1diag 𝑦, 𝑎′ , 𝑏′ , 𝑐′ 𝑘

= 𝑘−1diag(𝑥, 𝑎, 𝑏, 𝑐)diag(𝑦, 𝑎′, 𝑏′, 𝑐′)𝑘

= 𝑘−1diag 𝑥𝑦, 𝑎𝑎′ , 𝑏𝑏′ , 𝑐𝑐′ 𝑘 = 𝒠 𝑥𝑦, 𝑘 .

4.2.6 Complexity of the Computation Algorithms

We now consider the complexity of our multiplication and addition algorithms. First

consider the size of the integers in the ring 𝑍𝑁 . The value 𝑁 is the product of 𝑚 numbers of

length 𝜆 bits, so it is approximately an 𝑚𝜆 bit number. There exist efficient algorithms for

multiplication of 𝑏 bit integers with complexity 𝑂 𝑏 log 𝑏 log log 𝑏 . For 𝑏 = 𝑚𝜆 this becomes

𝑂 𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆 . In our case matrix multiplication involves 64 multiplications and 64

Page 66: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

54

additions. Since addition can be done in linear time, the algorithm is dominated by multiplication

and thus has complexity 𝑂 𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆 . Addition is linear and thus has complexity

𝑂(𝑚𝜆).

4.3 Homomorphic Encryption in Multi-user Systems

The homomorphic encryption scheme discussed in Section ‎4.2 cannot be securely used in

practical systems. To allow computation on encrypted data, the data stored on the database server

should be encrypted by the same master key. The master key then has to be shared by multiple

users who need to access the data. A user may need to encrypt secret data and send them with a

request to the server. Also, the server may send a response, which contains some encrypted data,

to a user and the user needs to decrypt the ciphertexts in the response. Having all users holding

the master key can compromise the security of the system, especially if the users are from many

different domains.

Our solution is let each user hold a unique user key 𝑘𝑖 and use a transformation function to

transform the ciphertext encrypted by the user key 𝑘𝑖 to the ciphertext encrypted by the master

key 𝑘. Such a transformation scheme may not always be easy to obtain for some encryption

algorithms. In our scheme, data are encrypted into matrix representation and a similarity

transform function can be used to achieve the goal. We develop the corresponding

communication protocol for sending secret data between the users and the server using

individual user keys and then use similarity transform to convert the user keys to the master

encryption key. In Subsection ‎4.3.1, we define a model for the multi-user systems. In

Subsection ‎4.3.2, we present the protocols for the user to send requests and receive responses

Page 67: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

55

while using her own user key to encrypt and decrypted the data in the request and the response,

respectively.

4.3.1 Settings

As the multi-user system described in Chapter 3, a single server DB hosting a database and

a set of users U = {Ui | i 1} accessing the data stored on DB. For security assurance, a key

agent KA1 is added in between the server and the users. Thus, the adversary structure is

Z = {𝑈𝒜, {DB}∪ 𝑈𝒜, {KA1}∪ 𝑈𝒜 , {DB, KA1}∪ 𝑈𝒜 | 𝑈𝒜 U},

where 𝑈𝒜 is the set of compromised users which could be empty. We let user Ui holds a user key

𝑘𝑖 , KA1 holds the first matching key of 𝑘𝑖 , denoted as 𝑘𝑖 ′. DB holds the second matching key of

𝑘𝑖 , denoted as 𝑘𝑖 ′′ , where 𝑘 = 𝑘𝑖 ⋅ 𝑘𝑖′ ⋅ 𝑘𝑖

′′ is the master key of the system. The keys are

generated and distributed by a trusted party TP at the system initialization time.

The data hosted by the DB may have different criticality levels and may be protected in

different ways. We only consider the data that should be protected during computation time and

encrypt them using our homomorphic encryption scheme. Additional protection can be added by

encrypting these data using a conventional encryption scheme, such as AES, when they are in

memory or disk and decrypt them in CPU. Also, we assume that the communications between

any two entities are via secure channels (e.g. messages are properly encrypted and

communication keys are properly established). The adversary cannot know the content of the

communication unless it compromises at least one entity. (In our protocol, we do not discuss the

additional protection mechanisms but only consider the steps relevant to our homomorphic

encryption.)

Page 68: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

56

4.3.2 The Multi-User Access Protocol

Here, we first introduce the similarity transformation function, which plays the central role

in the construction of the protocols, and discuss its property. Let 𝜑 be the similarity

transformation function

𝜑 𝐶, 𝑘′ = 𝑘′−1 ∙ 𝐶 ∙ 𝑘′

where 𝐶 = 𝒠(𝑥, 𝑘) is the ciphertext, and 𝑘, 𝑘′ are encryption keys. Then 𝜑 can transform the

encryption key from 𝑘 to 𝑘 ∙ 𝑘′ based on the following lemma.

Lemma 4.3.1: If 𝐶 = 𝒠(𝑥, 𝑘), then 𝜑 𝐶, 𝑘′ = 𝒠(𝑥, 𝑘 ∙ 𝑘′).

Proof. 𝜑 𝐶, 𝑘′ = 𝜑 𝒠 𝑥, 𝑘 , 𝑘′ = 𝑘′−1∙ 𝒠 𝑥, 𝑘 ∙ 𝑘′ = 𝑘′ −1

∙ 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐, 𝑘 ∙ 𝑘′

= 𝑘 ∙ 𝑘′ −1 ∙ diag 𝑥, 𝑎, 𝑏, 𝑐, ∙ 𝑘 ∙ 𝑘′ = 𝒠(𝑥, 𝑘 ∙ 𝑘′).

The process of the system consists of two phases: the system initialization phase, and the

request-response phase. At the system initialization time, a trusted party TP generates and

distributes the keys to the DB, KA1, and users, then TP exits the system and destroys all the key

related knowledge. In the request-response phase, the user Ui sends a request to DB, and DB

processes it, and sends a response to Ui.

Key generation and distribution. TP generates the master key 𝑘 by the method discussed

in Section ‎4.2.2. Then TP generates many key triples 𝑘𝑖 , 𝑘′𝑖 , 𝑘′′𝑖 . It uses the same method

(Section ‎4.2.2) to randomly generate user key 𝑘𝑖 and the first matching key 𝑘′𝑖 . Then, it

computes the second matching key 𝑘′′𝑖 = 𝑘′𝑖−1 ∙ 𝑘𝑖

−1 ∙ 𝑘. TP sends 𝑘𝑖 to user Ui, 𝑘′𝑖 to KA1, and

𝑘′′𝑖 to DB. In a static system, TP exits the system after key initialization and distribution. In a

dynamic system where new users may join the system dynamically, a key manager KM is also

Page 69: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

57

introduced to manage user keys. TP sends the list of unused user keys to KM to be distributed to

new users later. TP exits after key initialization and distribution. Note that matching keys can be

associated using their indices.

Request-response protocol. The main issue in the request-response protocol is how to

encrypt the data to be sent with the request and how to decrypt the data in the response. The

pseudo code for the request-response protocol is given in Figure 4.1. In the protocol, the critical

data in the request is encrypted (Line 1), and the encryption key is then transformed (Lines 2 and

3) by the KA1 and DB, respectively. Then, in Line 4, the DB processes the request and generates

the response (with an encrypted data in the response). The encryption key of the critical data in

the response is transformed (Lines 5 and 6) by the DB and KA1, respectively. Finally, the user

decrypts and gets the result (Line 7).

Figure 4.1. Request processing protocol.

Theorem 4.3.2: Suppose that the adversary attacks the multi-user system with the

adversary structure AS. The system is secure if the adversary collects less than 𝑚 ln poly(𝜆)

plaintext-ciphertext pairs.

(1) Ui prepares a request q with a sensitive data 𝑑. It encrypts 𝑑 with 𝑘𝑖 , obtaining

𝒠 𝑑, 𝑘𝑖 , and sends q with 𝒠 𝑑, 𝑘𝑖 to KA.

(2) KA computes 𝜑 𝒠 𝑑, 𝑘𝑖 , 𝑘′𝑖 = 𝒠 𝑑, 𝑘𝑖 ⋅ 𝑘′𝑖 and sends the updated request q to

DB.

(3) DB computes 𝜑 𝒠 𝑑, 𝑘𝑖 ⋅ 𝑘′𝑖 , 𝑘′′

𝑖 = 𝒠 𝑑, 𝑘𝑖 ⋅ 𝑘′𝑖 ∙ 𝑘′′𝑖 = 𝒠 𝑑, 𝑘 .

(4) DB processes q with 𝒠 𝑑, 𝑘 and generates response r with an encrypted data

𝐸 𝑑′, 𝑘 .

(5) DB computes 𝜑 𝒠 𝑑′, 𝑘 , 𝑘′′𝑖−1

= 𝒠 𝑑′, 𝑘𝑖 ⋅ 𝑘′𝑖 and sends 𝒠 𝑑′ , 𝑘𝑖 ⋅ 𝑘𝑖

′ with r to

KA.

(6) KA further computes 𝜑 𝒠 𝑑′ , 𝑘𝑖 ⋅ 𝑘𝑖′ , 𝑘′

𝑖−1

= 𝒠 𝑑′ , 𝑘𝑖 and sends 𝒠 𝑑′ , 𝑘𝑖 with r

to Ci.

(7) Ui receives the response and decrypts 𝒠 𝑑′ , 𝑘𝑖 with 𝑘𝑖 and gets 𝑑′.

Page 70: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

58

Proof. We consider three compromising situations with respect to the adversary structure

AS.

Case 1. The adversary compromises DB and KA1. Then 𝑘′′𝑖 and 𝑘′𝑖 are compromised, but

the master key 𝑘 and the key 𝑘𝑖 of the user Ui are intact. Therefore the adversary can neither

reverse the ciphertext encrypted by 𝑘 stored on DB, nor the ciphertext encrypted by 𝑘𝑖 sent from

Ui to KA1.

Case 2. The adversary compromises DB and 𝑈𝒜 . Then 𝑘′′𝑖 are compromised, 𝑘𝑗 are

compromised if Uj 𝑈𝒜, but the master key 𝑘 and the key 𝑘𝑗 ′ of the user Ui’ are intact if Uj’

𝑈𝒜. Therefore the adversary can neither reverse the ciphertext encrypted by 𝑘 stored on DB, nor

the ciphertext encrypted by 𝑘𝑗 ′ ∙ 𝑘′𝑗 ′ sent from KA1 to DB.

Case 3. The adversary compromises KA1 and 𝑈𝒜 . Then 𝑘′𝑖 are compromised, 𝑘𝑗 are

compromised if Uj 𝑈𝒜, but the key 𝑘𝑗 ′ of the user Ui’ are intact if Uj’ 𝑈𝒜. Therefore the

adversary cannot reverse the ciphertext encrypted by 𝑘𝑗 ′ sent from Uj’ to KA1, nor the ciphertext

encrypted by 𝑘𝑗 ′ ∙ 𝑘′𝑗 ′ sent from DB to KA1.

Thus it completes the proof.

4.4 Performance of Our Homomorphic Encryption Scheme

We implement our algorithm and evaluate its execution time. The large integer

multiplication and addition were implemented using the GNU Multiple Precision (GMP)

Arithmetic Library ‎[33]. In Figure 4.2, we give the number of milliseconds required to perform

addition and multiplication of encrypted data ( 4 × 4 matrices over the ring 𝑍𝑁 ). The

Page 71: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

59

computations were performed on a 2.16GHz Intel Core 2 Duo Processor. The security parameter

considered was 𝜆 = 1024 . The data was gathered from running 10000 additions and

multiplications of randomly selected numbers of length 𝑚𝜆 bits.

Figure 4.2. The size of 𝑚 against the speed of addition and multiplication of two encrypted data

(in milliseconds).

As can be seen from Figure 4.2, for a small enough 𝑚, the algorithm is very efficient. For

example, for 𝜆 = 1024 and 𝑚 = 16, the algorithm runs multiplication in only 108 milliseconds

and runs addition in a tenth of a millisecond. For such 𝜆 and 𝑚, we can choose poly 𝜆 = 𝜆10 =

2100 , which translates into 1109 chosen plaintexts in an attack that our algorithm can withstand.

This makes the algorithm practical for real world implementation where large scale plaintext

attacks are not an issue.

For the purpose of comparison, we estimate‎ the‎ computation‎ time‎ of‎ Gentry’s‎

homomorphic encryption scheme ‎[31]. In ‎[32], the performance of the primitive operations has

been studied: The bootstrapping (re-crypt) time is 6 seconds, which dominates the time of the

operations. For l-bit numbers, the addition circuit needs 5*l gates and the multiplication circuit

needs 11*l2 gates ‎[51].‎Thus,‎Gentry’s‎homomorphic‎encryption‎scheme‎needs‎5*32*6‎(> 900)

seconds to add two 32 bit numbers, and 11*322*6 (> 67000) seconds to multiply two 32 bit

0

2

4

6

8

10

1 4 16 64 256 512 768 1024

Tim

e (

ms)

m

Addition

0

5000

10000

15000

20000

25000

1 8 64 384 768Ti

me

(m

s)m

Multiplication

Page 72: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

60

numbers. This is far slower than our homomorphic encryption scheme (0.1 milliseconds to add

two 32 bit numbers and 108 milliseconds to multiply two 32 bit numbers).

We also study the performance of our request and response protocols for multi-user data

processing systems. Assume that the size of the secret data in the request and response is the

same as the security parameter. To factor in the communication cost, we simulate the case where

the user is at UTD, the KA1 is at ASU, and the DB is at UCLA. The performance results for

𝜆 = 1024 and 𝑚 = 16 are shown in Table 4.1 and the execution time are measured in

milliseconds.

Table 4.1. The performance of the communication protocol

Operation Process Description Performance

Encryption Encrypt one data item 215 ms

Decryption Decrypt one data item 34 ms

Transform Local one-time transformation 215 ms

The user sends

a request to

DB

User encrypts the data and sends the

request to KA1, KA1 transforms the

embedded data and sends the request to

DB, DB further transforms the data in the

request.

807 ms

(85 ms if the data in the request is

not encrypted and is sent from user

to DB directly)

DB sends a

response to the

user

DB transforms the data in the response

and sends the response to KA1, KA1

transforms the data and sends the

response to the user, the user decrypts the

data

806 ms

(77 ms if the data in the response is

not encrypted and is sent from DB

to user directly)

As can be seen, by using the communication protocol with our homomorphic encryption

algorithm, the performance for sending a request and receiving a response is degraded by

approximately 10 times from the case where encryption is not used, but it is still a reasonable

cost to achieve the desired security.

Page 73: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

61

4.5 Summary

In this chapter, we presented a novel non-circuit based homomorphic encryption algorithm.

Our scheme is fully homomorphic, but it is not semantic secure. The security of our

homomorphic encryption scheme is equivalent to the well known large integer factorization

problem (which is also the security basis for RSA), but it requires a chosen bound on the

plaintext‎ attack.‎Even‎ though‎Gentry’s‎ and‎ the‎ subsequent‎ solutions‎ are‎ semantic‎ secure,‎ their‎

time complexity is too high for practical use. Also, its circuit based approach suffers from a

significant overhead. Our scheme yields a very practical time complexity for encryption,

decryption, and computation on ciphertexts. Specifically, to withstand a chosen plaintext attack

with over 1000 plaintexts, the algorithm runs addition in only tenth of a millisecond and

multiplication‎ in‎hundred‎milliseconds.‎ In‎contrast,‎Gentry’s‎algorithm‎requires‎more‎ than‎900‎

seconds for addition and more than 67000 seconds for multiplication.

Our homomorphic encryption algorithm is symmetric-key based while most of the existing

algorithms are public key based. The only advantage of the public key homomorphic encryption

schemes is the possibility of encrypting data without needing to know the private key, i.e., so that

many clients can issue the requests to the encrypted database. However, in almost all

applications, it is necessary, but not secure, for the client to know the private key in order to read

back and decrypt the data in the response. Our request-response communication protocol for our

symmetric-key homomorphic encryption scheme can secure the request and response processes

in multi-user systems.

Page 74: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

62

CHAPTER 5

ORDER PRESERVING ENCRYPTION SCHEMES

Order-preserving encryption (OPE) scheme is a deterministic symmetric-key encryption scheme.

The ciphertexts of OPE preserve the order of the plaintexts. Thus search queries can be

processed efficiently using conventional DBMS techniques, e.g. establishing the B+ tree on

ciphertexts encrypted by OPE. However, on the other hand, OPE is not a perfectly secure

encryption scheme since ciphertexts inevitably leak the order information of the plaintexts. It is

therefore important to know exactly how much security an OPE scheme can provide.

Unfortunately, the existing security analysis for the OPE schemes are either informal (the

security analysis is based on experiments ‎[3]) or incomplete (the security analysis reduces the

security of an OPE scheme to the ideal OPE object (a special OPE) without further analyzing the

security of the ideal OPE object ‎[12]). After presenting the formal definition of the ideal OPE

object in Section ‎5.1, we complete the security analysis in ‎[12] by proving the one-wayness

security of the ideal OPE object [75, 77] in Section ‎5.2. We estimate the expected number of bit

information zh (formulated by the average min-entropy) of the plaintext that remain secret from

the adversary against a known plaintext attack with h known plaintexts. The result shows that the

ratio of zh to the length of plaintext is greater than a constant ratio. Since the probability for any

adversary to fully recover the plaintext is less than or equal to 1/2𝑧𝑕 , the estimation of zh implies

the one-wayness security of the ideal OPE object, i.e., the probability for any PPT adversary to

fully recover the plaintext encrypted by the ideal OPE object against an h known plaintext attack

Page 75: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

63

is a negligible function of the secure parameter. The security analysis result not only helps

improve our understanding of the security of OPE schemes and guides its parameter selections,

but also provides a general method for analyzing their security. A similar result is also given

in ‎[13] after our work was published as a technical report and submitted to conferences. In [15],

the authors first estimate the probability of a value being a ciphertext's most likely plaintext

(m.l.p.), and then approximate the sum of m.l.p. probabilities over the ciphertext space to get the

average attacking success probability.

The ideal object has been used as the security goal in the security definitions in many

existing literatures. Intuitively, such approach tries to make the cipher behavior as randomly as

possible in order to achieve the highest security. For deterministic encryption, the security

definition in [7, 49] requires it to be indistinguishable from the “ideal”‎object‎that is a function

drawn at random from all possible permutations. For order-preserving encryption, the security

definition in ‎[12] requires it to be indistinguishable from the “ideal”‎ object‎ that‎ is‎ a‎ function‎

drawn at random from all possible order-preserving functions. For prefix-preserving encryption,

the security definition in ‎[12] requires it to be indistinguishable from the “ideal”‎object‎that‎is‎a‎

function drawn at random from all possible prefix-preserving functions. However, it has not been

carefully examined whether the ideal object as defined in ‎[12] has the highest possible security

for OPE schemes. It is meaningless to construct a real scheme indistinguishable from the “ideal”‎

object which is not secure. It can be shown that the ideal deterministic encryption object achieves

the highest security notion IND-DCPA (Indistinguishability against Distinct Chosen-Plaintext

Attacks) [7, 49]. Consequently, it is valid to use ideal object in the security definition of

Page 76: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

64

deterministic encryption. For OPE, the authors in ‎[12] attempt to prove that the ideal OPE object

achieves the security notion IND-OCPA (Indistinguishability against Ordered Chosen-Plaintext

Attacks). But they discover that given two randomly selected plaintexts, the distance of the

corresponding ciphertexts is small if the distance of the plaintexts is small; the distance of the

corresponding ciphertexts is large if the distance of the plaintexts is large. Based on this

property, the authors design the big jump‎attack‎to‎retrieve‎the‎“distance”‎information about the

plaintexts from the ciphertexts. Since it leaks more than order information, the ideal OPE object

cannot achieve the security notion IND-OCPA. Hence, there is no proof or any evidence to show

that the ideal OPE object achieves the highest security notion. In fact we prove that the ideal

OPE object is not necessarily the most secure OPE (Section ‎5.3).

In Section ‎5.4 we design two generalized OPE (GOPE) algorithms in polynomial-sized and

superpolynomial-sized domains to satisfy stronger notions of security than the ideal OPE object.

First, we consider the security notion IND-OCPA for OPE algorithms in polynomial-sized

domains. Note that the attacks designed so far (such as the big jump attack given in ‎[12] to show

that ideal OPE object cannot achieve IND-OCPA in superpolynomial-sized domains) do not

eliminate the possibility for designing an OPE algorithm that is secure under IND-OCPA in

polynomial-sized domains. In fact, we extend the concept of encryption to design the GOPE

algorithms to achieve IND-OCPA. The difference between OPE and GOPE lies in the fact that

the ciphertexts of OPE are numbers while the ciphertexts of GOPE are allowed to be general

mathematical objects. Hence, the GOPE scheme requires a special comparison algorithm to

compare the ciphertexts.

Page 77: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

65

We also study the security level OPE can achieve in superpolynomial-sized domains. We

weaken the security notion from IND-OCPA to IND-OLCPA. IND-OLCPA has one more

constraint to the adversary compared to IND-OCPA, that is, the range of plaintexts in the oracle

queries is bounded by a polynomial g1, i.e., the difference between the largest and the smallest

plaintexts in the oracle query is less than or equal to g1. We show that the lower bound on the

advantage of an adversary against any OPE algorithms under IND-OLCPA is 1

𝑔, where g is a

polynomial. Note that this lower bound is not achieved by the ideal OPE object. Accordingly, we

construct another GOPE algorithm to achieve this lower bound under IND-OLCPA.

is constructed based on two building blocks and . is adapted from such that the

ciphertext of a plaintext is secure under IND-OLCPA if can only support comparison between

two plaintexts whose difference is bounded by ( is designed to facilitate the comparison between

two plaintexts should preserve the order of the corresponding plaintexts should also guarantee

that the ciphertexts as follows: The ciphertexts of the first includes and , for any pair of

plaintexts, either or will fulfill the comparison task. Also, since the attacker can only query

plaintexts within the range are indistinguishable and the ciphertexts from have a small statistical

distance. Thus, achieves the lower bound on the advantage of an adversary. As discussed in

Chapter 1, existing OPE schemes have the single-key problems and cannot support multi-user

systems. To solve the single key problem, we develop protocols to support multi-user data-

centric systems where OPE schemes are used to protect the sensitive data that need to be

searched in encrypted form. We introduce a group of key agents into the system and invent the

protocol‎to‎enable‎“distributed‎encryption”‎to‎assure‎that‎the‎OPE‎encryption‎key‎is‎not‎known‎

Page 78: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

66

by any entity in the system. In Section ‎5.5, we briefly discuss our approach. Then in Section ‎5.6

we develop a digit based OPE (DOPE) protocol, where p key agents are deployed between the

DB and users. The master encryption key is shared to the p key agents such that each key agent

holds a different encryption key. For secret data x, it is mapped it into p “digits”, and each of the

p digits is encrypted by a separate key agent with a distinct key using an existing OPE scheme

(any OPE scheme can be used with our protocol). The ciphers of the digits are sent to DB and

integrated to the final ciphertext. Since the cipher of each digit is order-preserving, the integrated

ciphertext is order-preserving.

The basic DOPE protocol has some security problems. A key agent can see the plain digit,

which reveals part of the confidential data x. Additionally, if an adversary compromises the DB

and one key agent, then he can use the key to compromise the same digit of every data in the DB.

To cope with the attacks, we invent the oblivious encryption (OE, alternate to oblivious transfer)

technique in Section ‎5.7 to‎enable‎the‎key‎agents‎to‎“obliviously” encrypt‎the‎“digits”‎without‎a‎

high overhead. Moreover, we use a chain of key agents to encrypt each digit so that the key for

the digit cannot be compromised unless all the key agents in one chain are compromised. To

further prevent the adversary from using the location information (used in OE) and order

information (between the plaintexts and ciphertexts), we require each key agent in the chain

randomly permutes the vector it receives (vector permutation), and each key agent in the chain

will substitute half elements in the vector (data mutation) to randomize the orders of the elements

in the vector. We develop a complete solution, the OE-DOPE protocol, based on the basic-

DOPE, OE, and the key agent chain with vector permutation and data mutation approaches. The

Page 79: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

67

performance study of OPE algorithms and the protocols for multi-user systems is conducted in

Section ‎5.8. Finally we summarize this chapter in Section ‎5.9.

5.1 Background

Let λ be the security parameter and ν be a negligible function. Let 𝑥$ 𝐴 denote that x is

uniformly randomly selected from set A, 𝑥$ 𝒳 denote that randomized algorithm 𝒳 returns

value x, and 𝒳𝒴 denote that algorithm 𝒳 is accessible to oracle 𝒴 . To facilitate thorough

analysis, we consider OPE schemes in two domains, the polynomial-sized domain and the

superpolynomial-sized domain. If m is a polynomial of λ, then the OPE scheme is in the

polynomial-sized domain, and if m is a superpolynomial of λ, then the OPE scheme is in the

superpolynomial-sized domain. Various security notions are defined for these two domains. We

first introduce the fundamental security notion IND-CPA (indistinguishability under chosen-

plaintext attack) and define it in Definition 5.1.1.

Definition 5.1.1 (IND-CPA): Let 𝒮𝒠 = (𝒦, 𝒠, 𝒟) be a symmetric-key encryption scheme

and b{0,1}. Let 𝒠k(ℒℛ(∙,‎ ∙,‎b)) be a left-or-right encryption oracle such that for queries {(𝑥𝑢0,

𝑥𝑢1)}1≤u≤h, it returns

𝒠(𝑥𝑢𝑏 , k)

$ 𝒠k(ℒℛ(𝑥𝑢

0, xu1, b))

for‎1‎≤‎u ≤‎h. Let 𝒜 be an adversary that can access 𝒠k(ℒℛ(∙,‎∙,‎b)) and finally returns a bit b' as a

guess of b. Consider the following experiment.

Experiment 𝐄𝐗𝐏𝒮𝒠,𝒜IND −CPA −𝑏

k $ 𝒦; b'

$ 𝒜𝒠𝑘 (ℒℛ(∙,∙,𝑏)); Return b'

Page 80: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

68

The encryption scheme 𝒮𝒠 is said to be secure under IND-CPA if for every probabilistic

polynomial time (PPT) adversary 𝒜, the advantage of 𝒜, defined by

𝐀𝐃𝐕𝒮𝒠,𝒜IND −CPA = Pr[𝐄𝐗𝐏𝒮𝒠,𝒜

IND −CPA −1 =‎1]‎−‎Pr[𝐄𝐗𝐏𝒮𝒠,𝒜IND −CPA −0 = 1]

is bounded by a negligible function of the security parameter.

OPE schemes are not secure under IND-CPA because the ciphertexts leaks the order

information of plaintexts. Consider the adversary queries (𝑥10, 𝑥1

1) and (𝑥20, 𝑥2

1), where 𝑥10 < 𝑥2

0

and 𝑥11 ≥‎𝑥2

1 . If b = 0, 𝑥10 and 𝑥2

0 will be encrypted, where 𝑥10 < 𝑥2

0; if b = 1, 𝑥11 and 𝑥2

1 will be

encrypted where 𝑥11 ≥‎𝑥2

1. Since OPE preserves order, the adversary can distinguish whether the

plaintexts are 𝑥10 and 𝑥2

0 or 𝑥11 and 𝑥2

1 by comparing the corresponding ciphertexts. Thus, the

advantage of such adversary is 1.

In ‎[12], the security notion is weakened to IND-OCPA (indistinguishability under ordered

chosen-plaintext attack), where the adversary is forbidden to query plaintexts with different

orders.

Definition 5.1.2 (IND-OCPA ‎[12]): IND-OCPA has the same definition as that of IND-

CPA except that the adversary is only allowed to query {(𝑥𝑢0, 𝑥𝑢

1) | 1 ≤‎u ≤‎h}, where the condition

𝑥𝑢0 < 𝑥𝑢

0 𝑥𝑢1 < 𝑥𝑣

1, 1 ≤ u, v ≤ h

is satisfied.

IND-OCPA is the highest security notion (with respect to indistinguishability and left-or-

right encryption oracle) for OPE algorithms. However, in ‎[12], it has been shown that OPE

schemes are susceptible to the following the big jump attack under IND-OCPA.

Page 81: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

69

Definition 5.1.3 (Big jump attack ‎[12]): Consider the following PPT adversary 𝒜𝐵𝐽 with

three oracle queries in the experiment of security notion IND-OCPA.

Adversary 𝓐𝐵𝐽𝒠𝑘(ℒℛ(∙,∙,𝑏))

x $ {1, ..., m−1}

y1 𝒠𝑘(ℒℛ(1, 𝑥, 𝑏))

y2 𝒠𝑘(ℒℛ(𝑥, 𝑥 + 1, 𝑏))

y3 𝒠𝑘(ℒℛ(𝑥 + 1, 𝑚, 𝑏))

Return 1 if y3 − y2 > y2 − y1; else return 0.

In the big jump attack, the attacker chooses left plaintexts 1, x, and x+1, and the right

plaintexts x, x+1, and m, where x is randomly selected from {1, ..., m−1}. From the ciphertexts, if

y3 − y2 > y2 − y1, then the attacker can guess that the right plaintexts were encrypted; if y3 − y2 ≤ y2

− y1, then the attacker can guess that the left plaintexts were encrypted. Since the distance between

two ciphertexts can reflect, to some extent, the distance between the corresponding two plaintexts,

such guess will have a high probability of being correct. The lower bound on advantage of the

adversary has been derived in ‎[12] and is cited in Lemma 5.1.1.

Lemma 5.1.1: 𝐀𝐃𝐕𝒮𝒠,𝒜𝐵𝐽

IND −OCPA ≥ 1 −2log 𝑛

𝑚−1

Remark 5.1.1: Note that for efficient OPE, both logm and logn should be bounded by a

polynomial of λ. Therefore 𝐀𝐃𝐕𝒮𝒠,𝒜𝐵𝐽

IND −OCPA ≥ 1 − 𝜈(𝜆) if m is a superpolynomial of λ, which

implies that it is impossible to construct an OPE that is secure under IND-OCPA if m is a

superpolynomial of λ. However, the lower bound on advantage of the adversary does not

Page 82: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

70

eliminate the possibility for designing an OPE scheme that is secure under IND-OCPA if m is

bounded by a polynomial of λ.

Because of the big jump attack, the authors in ‎[12] take an alternative approach: They

define the security notion POPF-CCA (pseudorandom order-preserving function under chosen-

ciphertext attack) based on the ideal OPE object defined as follows.

Definition 5.1.4 (Ideal OPE Object): We say that 𝒮𝒠* = (𝒦*

, 𝒠*, 𝒟*

) is the ideal OPE

object if

- 𝒦* uniformly randomly selects 𝑓 ∈ 𝑂𝑃𝐸𝑚 ,𝑛 = 𝑔: 𝑚 𝑛 𝑥 < 𝑥′𝑔 𝑥 < 𝑔(𝑥′)};

- 𝒠* encrypts x to f(x);

- 𝒟* decrypts y to f

−1(y).

For a “real” OPE scheme 𝒮𝒠 = (𝒦, 𝒠, 𝒟) , it is secure under POPF-CCA if it is

computationally indistinguishable from the ideal OPE object 𝒮𝒠∗ = (𝒦∗, 𝒠∗, 𝒟∗). Formally, the

security notion POPF-CCA is defined as follows.

Definition 5.1.5 (POPF-CCA ‎[12]): Let the advantage of the adversary in POPF-CCA be

𝐀𝐃𝐕𝒮𝒠,𝒜POPF −CCA = Pr[𝑘

$ 𝒦 : 𝒜𝒠 𝑘 ,∙ ,𝒟(𝑘 ,∙) = 1]‎−‎Pr[𝑘

$ 𝒦∗ : 𝒜𝒠∗(𝑘 ,∙),𝒟∗(𝑘 ,∙) = 1].

The encryption scheme 𝒮𝒠 is said to be secure under POPF-CCA if 𝐀𝐃𝐕𝒮𝒠,𝒜POPF −CCA is bounded by

a negligible function of the security parameter for every PPT adversary 𝒜.

Based on the security notion POPF-CCA, the authors in ‎[12] construct a real OPE scheme

and prove that it is secure under POPF-CCA. In other words, in their approach the ideal OPE

object is used as the security goal and construct real OPE scheme to achieve that security goal.

Page 83: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

71

However, the problem is: is the ideal OPE object always the most secure OPE. We construct a

counterexample to show the negative conclusion in the next section.

5.2 Security of OPE

In ‎[12], the authors reduce the security of 𝒮𝒠 to security of the ideal object, they do not

analyze the security of the ideal OPE object. As an obvious counter example, the ideal object is

not secure when n = m. Indeed, there exists no secure OPE scheme when n = m because the

encryption algorithm is necessarily the identity function. In ‎[12], the authors left open the

questions of how to measure the security of the ideal OPE object and how to choose n given m.

In this section, we analyze the security of the ideal OPE object. First, we need to establish

the attack model for the analysis. The security notions considered in ‎[12], e.g. IND-CPA, IND-

DCPA, and IND-OCPA, are all related to chosen plaintext attacks. In the security notion of IND-

CPA, the adversary is allowed to make queries of the form 𝑥𝑖0, 𝑥𝑖

1 𝑖=1𝑕 . Afterwards the left-

right-encryption-oracle will return the ciphertexts {𝐸 𝑥𝑖𝑏 , 𝑘 }𝑖=1

𝑕 to the adversary, where b is a

randomly selected bit. The security of the encryption scheme depends on how precisely the

adversary can predict b. The form of queries in the game of IND-CPA is specialized to facilitate

the definition of indistinguishability. IND-DCPA and IND-OCPA consider similar security

games except for the fact that they give additional constraints on the queries. Another effective

security game against the OPE scheme is to reverse the order of the chosen plaintext attack, i.e.,

the adversary is given the ciphertext, called the challenge, and subsequently chooses the

plaintexts. In this case, the adversary can reverse 𝒠 𝑥, 𝑘 by the following binary-search chosen

Page 84: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

72

plaintext attack. The adversary 𝒜 begins the attack by choosing the midpoint 𝑝 =𝑚+1

2, and asks

the encryption oracle to encrypt p. If 𝒠 𝑥, 𝑘 = 𝒠 𝑝, 𝑘 , then the adversary knows that 𝑥 = 𝑝.

If 𝒠 𝑥, 𝑘 > 𝐸 𝑝, 𝑘 , then the adversary knows that 𝑥 > 𝑝 . 𝒜 can continue the attack by

choosing the plaintext 𝑝+𝑚

2. If 𝒠 𝑥, 𝑘 < 𝐸 𝑝, 𝑘 , then 𝒜 knows that 𝑥 < 𝑝. Then 𝒜 continues

the attack by choosing the plaintext 1+𝑝

2. Thus, after at most log m chosen plaintext attacks, the

adversary can reverse 𝒠 𝑥, 𝑘 . The security notions in these models are too strong and OPE

schemes cannot achieve the security level of such security games.

We develop a new attack model by considering a common scenario in third party hosting

with potential external attacks. Let O denote the owner of a database DB, where DB and its

corresponding querying logic are hosted on the Web by a third party Host. DB is encrypted using

an OPE scheme to protect its secrecy and O holds the encryption key. DB can be accessed by

various clients in CL and O may distribute the encryption key to legitimate clients in CL. The

goal is to protect DB from potential attacks. We assume that a public key infrastructure is in

place and the identities of individuals in O, CL, Host, and outsiders can be authenticated

correctly. Note that it is not possible to protect DB against any key holders in O and CL. At the

same time, it is not possible for an individual (attacker) without a key to arbitrarily choose a

plaintext and obtain the corresponding ciphertext. An attacker may happen to know some

plaintexts and be able to find out the corresponding ciphertexts. Thus, we do not consider chosen

plaintext attacks such as those in ‎[12]. Instead, we consider the known plaintext attack model,

where the adversary is given a ciphertext 𝒠 𝑥, 𝑘 (called the challenge) to compromise. The

attack model is formally given in Definition 5.2.1.

Page 85: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

73

Definition 5.2.1 (Attack Model): The known plaintext attack model we consider involves

an adversary with h pairs of known plaintexts and ciphertexts. Let

KP = 𝑥𝑖 , 𝒠 𝑥𝑖 , 𝑘 1 ≤ 𝑖 ≤ 𝑕 denote the set of h plaintext/ciphertext pairs. Then, the

adversary is given a ciphertext 𝒠 𝑥, 𝑘 (called the challenge). The goal of the adversary is to

compromise x from the challenge 𝒠 𝑥, 𝑘 based on KP.

Next, we need to determine how to generate the challenge 𝒠 𝑥, 𝑘 in the attack model.

Since the encryption algorithm 𝒠 is deterministic, the adversary can always reverse the

ciphertexts 𝒠 𝑥𝑖 , 𝑘 , where 1 ≤ 𝑖 ≤ 𝑕, since 𝑥𝑖 is a known plaintext. Thus, we assume that x is

selected from 𝑚 ∗ instead of [𝑚] . Note that in the security definition of conventional

deterministic (probabilistic) encryption schemes, it is required that the adversary cannot retrieve

any bit of information of any selected 𝑥 ∈ 𝑚 ∗ (𝑥 ∈ 𝑚 ) from the corresponding ciphertext

against the known plaintext attack. That is to say the choice of x should not affect the security

result. However, the OPE scheme cannot reach such security level. Suppose that the adversary

knows the plaintexts/ciphertext pairs in the set KP = 𝑥, 𝒠 𝑥, 𝑘 , 𝑥 + 2, 𝒠 𝑥 + 2, 𝑘 , where

1 ≤ 𝑥 ≤ 𝑚 − 2. Since ciphertexty is encrypted from plaintext x+1 if and only if 𝒠 𝑥, 𝑘 <

𝑦 < 𝐸 𝑥 + 2, 𝑘 , the adversary can reverse plaintext x+1 from 𝒠(𝑥 + 1, 𝑘) based on KP.

Therefore, worst-case security is not suitable for quantifying the security of the OPE scheme.

Hence, we consider average-case security for the OPE scheme instead. We assign weights to the

elements in 𝑚 ∗ , and consider the expected security on 𝑚 ∗ . Factors such as data access

distribution and adversary's personal interest could affect weight assignments on 𝑚 ∗. However,

without prior information of the application environment, there is no way to tell which data is

Page 86: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

74

more/less important. Thus, in this paper, we assume that the elements in 𝑚 ∗ are evenly

weighted, i.e., x is uniformly selected from 𝑚 ∗. The security analysis based on this assumption

can be the basis for further analysis considering non-evenly weighted 𝑚 ∗for the choice of the

challenge.

According to the attack model discussed above, the security of OPE schemes can be

measured by the one-wayness security defined in Definition 5.2.2.

Definition 5.2.2 (One-Wayness Security): We say that an encryption scheme 𝒮𝒠 = (𝒦, 𝒠,

𝒟) achieves the one-wayness security if

Pr[𝒜(𝒠(x, k), KP) = x] = ν(λ),

for any PPT (probabilistic polynomial time) adversary 𝒜, where x is chosen uniformly randomly

from the plaintext domain, KP = {(xi, 𝒠(xi, k)) | 1 i h} is the set of h (h is bounded by a

polynomial of λ) plaintext ciphertext pairs known by 𝒜 and x1,‎…,‎xh are also chosen uniformly

randomly from the plaintext domain, and ν denotes a negligible function.

Consider the adversary 𝒜 who randomly outputs an element in the plaintext domain. Then

the success probability for 𝒜 to reverse the ciphertext 𝒠(x, k) is 1/m. Since λ will be set to be

logm ‎[46], 𝒜 succeeds with negligible probability. However, it is not a complete security proof

because there may be other adversaries. Actually we can show that by choosing n ≥‎m3 > 1, the

ideal OPE object achieves one-wayness security, i.e. the probability for any adversary to fully

recover a plaintext is a negligible function of the security parameter λ = logm if the number h of

known plaintext/ciphertext pairs satisfies h = o(mε), 0 < ε < 1. The proof is relegated to the

Page 87: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

75

appendices and we conclude it in Theorem 5.2.1. Therefore, the real OPE schemes

computationally indistinguishable from the ideal PPE object are also have one-wayness security.

Theorem 5.2.1: The ideal OPE object 𝒮𝒠* achieves one-wayness security.

5.3 The Limitation of the Ideal OPE Object

In this section we show that there exists situation such that the ideal OPE object is not the

most secure OPE. We consider a specific plaintext domain [m] and ciphertext range [n],

construct a real OPE scheme 𝒮𝒠 = (𝒦, 𝒠, 𝒟) for [m] and [n] and prove that 𝒮𝒠 is secure under

IND-OCPA, and prove that the ideal OPE object 𝒮𝒠∗ for [m] and [n] is not secure under IND-

OCPA.

Plaintext domain and ciphertext range: In this section, let m = 2 and n = 2λ where λ is

the security parameter. Then the plaintext domain is [m] = {1, 2} and the ciphertext range is [n]

= {j | 1 ≤ j ≤ 2λ}.

The real OPE scheme: First we construct a real OPE scheme 𝒮𝒠 = (𝒦, 𝒠, 𝒟) as follows.

- 𝒦: It uniformly randomly selects f {g: [m] [n] | g(2) = g(1) + 1};

- 𝒠: For plaintext x, it returns f(x);

- 𝒟: For ciphertext y, it returns f−1

(y).

Unlike the ideal OPE object, in the real OPE scheme 𝒮𝒠 the encryption function is

uniformly randomly selected from a subset of order-preserving functions. The encryption

function 𝒠 has the property such that 1 is encrypted to a random element r in [1, n−1] while 2 is

encrypted to r+1. To show that the real OPE scheme 𝒮𝒠 is secure under IND-OCPA, we

Page 88: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

76

compute the statistical distance between the probability distribution of ciphertexts for plaintext 1

and the probability distribution of ciphertexts for plaintext 2, and prove that it is negligibly small.

Based on this fact, we show that the success probability of every attack in IND-OCPA is also

negligibly small.

Lemma 5.3.1: Let Δ be the statistical distance between 𝒠(1) and 𝒠(2). Then Δ = 𝜈(𝜆).

Proof. According to the definition of 𝒠, 𝒠 𝑖 ∈ [𝑛] subjects to the probability distribution

such that

Pr[𝒠 1 = 𝑗] = 1

𝑛−1for 1 ≤ 𝑗 < 𝑛

0 for 𝑗 = 𝑛 and

Pr[𝒠 2 = 𝑗] = 0 for 𝑗 = 11

𝑛−1for 1 < 𝑗 ≤ 𝑛

Thus

Δ =1

2 Pr 𝒠 1 = 𝑗 − Pr[𝒠 2 = 𝑗] 𝑗 =

1

𝑛−1=

1

2𝜆−1= 𝜈 𝜆

Proposition 5.3.2: 𝒮𝒠 is secure under IND-OCPA. Specifically, 𝐀𝐃𝐕𝒮𝒠,𝒜IND −OCPA = 𝜈(𝜆) for

every PPT adversary 𝒜.

Proof. Note that the adversary has to query ordered plaintext pairs to ℒℛ in IND-OCPA

and here are the all possible queries of the adversary: {(1,1)}, {(2,2)}, {(1,1),(2,2)}, {(1,2)}, and

{(2,1)}. We analyze the security of 𝒮𝒠 according to these queries.

(1) The adversary queries {(1,1)} to ℒℛ. In this case, since the left plaintext equals to the

right plaintext, the returned ciphertexts cannot help the adversary to decide whether the left

plaintext or right plaintext is encrypted. Hence 𝐀𝐃𝐕𝒮𝒠,𝒜IND −OCPA = 0.

Page 89: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

77

(2) The adversary queries {(2,2)} or {(1,1),(2,2)} to ℒℛ. The situation is similar to that in

(1) and hence 𝐀𝐃𝐕𝒮𝒠,𝒜IND −OCPA = 0.

(3) The adversary queries {(1,2)} to ℒℛ. According to Lemma 5.3.1,

𝐀𝐃𝐕𝒮𝒠,𝒜IND −OCPA = Pr[𝐄𝐗𝐏𝒮𝒠,𝒜

IND −OCPA −1 =‎1]‎−‎Pr[𝐄𝐗𝐏𝒮𝒠,𝒜IND −OCPA −0 = 1] = Δ = 𝜈(𝜆).

(4) The adversary queries {(2,1)} to ℒℛ. The situation is similar to that in (3) and hence

𝐀𝐃𝐕𝒮𝒠,𝒜IND −OCPA = 𝜈(𝜆).

According to (1) - (4), 𝐀𝐃𝐕𝒮𝒠,𝒜IND −OCPA = 𝜈(𝜆) for every PPT adversary 𝒜.

The ideal OPE object: According to Definition 5.1.4, the ideal OPE object 𝒮𝒠∗ =

(𝒦∗, 𝒠∗, 𝒟∗) is defined as follows.

- 𝒦∗: It uniformly randomly selects f {g: [m] [n] | g(1) < g(2)};

- 𝒠∗: For plaintext x, it returns f(x);

- 𝒟∗: For ciphertext y, it returns f−1

(y).

To show that the ideal OPE object 𝒮𝒠∗ is not secure under IND-OCPA, we compute the

statistical distance between the probability distribution of ciphertexts for plaintext 1 and the

probability distribution of ciphertexts for plaintext 2, and prove that it is significant (greater than

a positive constant). Based on this fact, we design an attack to distinguish left plaintext 1 and

right plaintext 2 according to the returned ciphertext y by comparing the conditional probabilities

Pr[y | 1] and Pr[y | 2]. It can be shown that the success probability of the attack is non-negligible

(greater than a positive constant).

Lemma 5.3.3: Let Δ∗ be the statistical distance between 𝒠∗(1) and 𝒠∗(2) . Then Δ∗ =

Ω(1).

Page 90: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

78

Proof. Since |OPEm,n| = 𝑛𝑚

and |{f OPEm,n | f(i) = j}| = 𝑗 − 1𝑖 − 1

𝑛 − 𝑗𝑚 − 𝑖

, for i [m],

𝒠∗ 𝑖 ∈ [𝑛] subjects to the negative hypergeometric distribution

𝑗−1𝑖−1

𝑛−𝑗𝑚−𝑖

𝑛𝑚

, 1 ≤ j ≤ n.

Thus

Δ∗ =1

2

𝑗−1

0

𝑛−𝑗𝑚−1

𝑛𝑚

𝑗−1

1

𝑛−𝑗𝑚−2

𝑛𝑚

𝑗 =

1

2

𝑗−1

0

𝑛−𝑗1

𝑛2

− 𝑗−1

1

𝑛−𝑗0

𝑛2

𝑗

= 𝑛−2𝑗+1 𝑗

2 𝑛2

=𝑛

2(𝑛−1)≥

1

2= Ω(1)

For the ideal OPE object, if 1 is encrypted to j, then 2 must be encrypted to [j+1, n], and

hence there is more choices of the encryption of 2 if j is small; similarly if 2 is encrypted to j,

then 1 must be encrypted to [1, j−1], and hence there is more choices of the encryption of 1 if j is

large. Since the encryption function of the ideal OPE object is uniformly randomly selected from

all order-preserving functions, 1 is more likely to be encrypted to [1, (n+1)/2] and 2 is more

likely to be encrypted to [(n+1)/2, n]. Lemma 5.3.3 indicates that the difference of the

encryptions of 1 and 2 is significant. Such significant difference can be used to design the attack,

and based on the attack we prove that the ideal OPE object is not secure under IND-OCPA in

Proposition 5.3.4.

Proposition 5.3.4: For the ideal OPE object 𝒮𝒠∗ with the plaintext domain [m] and the

ciphertext range [n], there exists an adversary 𝒜 who can distinguish plaintexts 1 and 2 with one

Page 91: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

79

oracle query under IND-OCPA such that 𝐀𝐃𝐕𝒮𝒠∗,𝒜IND −OCPA = Ω(1). In other words, the ideal OPE

object 𝒮𝒠∗ is not secure under IND-OCPA.

Proof. Since |OPEm,n| = 𝑛𝑚

and |{f OPEm,n | f(i) = j}| = 𝑗 − 1𝑖 − 1

𝑛 − 𝑗𝑚 − 𝑖

, for i [m],

𝒠∗ 𝑖 ∈ [𝑛] subjects to the negative hypergeometric distribution

𝑗−1𝑖−1

𝑛−𝑗𝑚−𝑖

𝑛𝑚

, 1 ≤ j ≤ n.

Note that

𝑗−1

0

𝑛−𝑗𝑚−1

𝑛𝑚

>

𝑗−1

1

𝑛−𝑗𝑚−2

𝑛𝑚

𝑛 − 𝑗𝑚 − 1

> (𝑗 − 1) 𝑛 − 𝑗𝑚 − 2

⟺ n – j – m + 2 > (j − 1) (m − 1)

𝑚=2 n – j > j – 1

⟺ j < (n+1)/2

Thus we construct the PPT adversary 𝒜 with one oracle query in the experiment of

security notion IND-OCPA as follows (note that y ≠ (n+1)/2 since n = 2λ).

Adversary 𝓐𝒠𝑘∗(ℒℛ(∙,∙,𝑏))

y 𝒠𝑘∗(ℒℛ(1,2, 𝑏))

Return 0 if y < (n+1)/2

Return 1 if y > (n+1)/2

Then

𝐀𝐃𝐕𝒮𝒠∗,𝒜IND −OCPA = Pr[𝐄𝐗𝐏𝒮𝒠∗,𝒜

IND −OCPA −1 = 1] − Pr[𝐄𝐗𝐏𝒮𝒠∗,𝒜IND −OCPA −0 = 1] = Δ∗ = Ω(1)

Page 92: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

80

Remark 5.3.1: The proofs in Lemma 5.3.3 and Proposition 5.3.4 can be generalized to

show that the ideal OPE object is not secure under IND-OCPA for any plaintext domain [m] and

ciphertext range [n].

We conclude the results in this section in the following theorem.

Theorem 5.3.5: The ideal OPE object 𝒮𝒠∗ is not the most secure OPE for m = 2 and n =

2λ. Specifically, there exists a real OPE scheme 𝒮𝒠 secure under IND-OCPA while the ideal OPE

object 𝒮𝒠∗ is not secure under IND-OCPA.

5.4 Generalized OPE (GOPE)

5.4.1 Generalized OPE in the Polynomial-sized Domain

We define the concept of the generalized OPE (GOPE) scheme. Unlike OPE whose

ciphertext-space is [n], GOPE adopts general mathematical objects as ciphertexts. Hence a

special comparison algorithm is needed to compare the ciphertexts.

Definition 5.4.1 (GOPE scheme): A GOPE scheme 𝒮𝒠 = (𝒦, 𝒠, 𝒟, 𝒞) is a symmetric-key

encryption scheme, where 𝒦: {0,1}∗ → {0,1}∗ is a key generation algorithm, 𝒠: 𝑚 × {0,1}∗ →

𝑅 is an encryption algorithm, 𝒟: 𝑅 × {0,1}∗ → [𝑚] is a decryption algorithm, and 𝒞: 𝑅 × 𝑅 → {=

, >, <} is a comparison algorithm. 𝒮𝒠 satisfies that

Pr 𝒟 𝒠 𝑥, 𝑘 , 𝑘 = 𝑥 > 1 − 𝜈(𝜆)

for any x [m] and key k, and

Pr 𝒞 𝒠 𝑥, 𝑘 , 𝒠 𝑥′, 𝑘 = 𝑤 > 1 − 𝜈(𝜆)

for any xwx’ and w {=, >, <}.

Page 93: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

81

Next we construct the GOPE scheme 𝒮𝒠2 = (𝒦2, 𝒠2, 𝒟2, 𝒞2) with m being a polynomial of

λ,‎and‎prove‎that‎it‎is‎secure‎under‎IND-OCPA. In 𝒮𝒠2 the ciphertext y for plaintext x is‎a‎“set”.‎

An element in y is a share of the relation between x and x’, for all other plaintexts x’. When

comparing x and x’, the matching pair of shares from x and x’ can be retrieved to reconstruct the

relation (x < x’ or x > x’).‎Let‎the‎symbol‎“<”‎encoded‎to‎1‎ Z3 and the symbol “>”‎encoded‎to‎2‎

Z3. 𝒮𝒠2 is constructed as follows.

- 𝒦2: Given the domain size m, it randomly picks a permutation of the set {(x, x’)‎|‎1‎≤‎x

< x’ ≤‎m}, and randomly generates rxx’ Z3 for‎1‎≤‎x < x’ ≤‎m. It returns {(, rxx’)‎|‎1‎≤‎x < x’ ≤‎

m};

- 𝒠2: For plaintext x, it returns the ciphertext y = {((x’, x), rx’x) | x’ < x}∪{((x, x’), 1 +

rxx’}) | x’ > x};

- 𝒟2: For ciphertext y, it retrieves (any) two elements (i, s) and (i’, s’) from the set y, and

returns plaintext x which appears in both −1

(i) and −1

(i’);

- 𝒞2: For ciphertexts y and y’, if y = y’, it returns =. Otherwise, it retrieves (i, s) from the set

y and (i, s’) from the set y’, if s – s’ = 1, it returns <; if s – s’ = 2, it returns >.

The efficiency, correctness, and security of 𝒮𝒠2 are presented in Lemma 5.4.1 and

Theorem 5.4.2.

Lemma 5.4.1: 𝒮𝒠2 is efficient and correct.

The efficiency of 𝒮𝒠2 and correctness of decryption algorithm can be easily verified. It

suffices to verify the correctness of comparison algorithm. For x = x’, since 𝒠2 𝑥, 𝑘 = 𝒠2(𝑥′, 𝑘),

it is correct for the comparison algorithm to return =. For x ≠‎x’, there exist unique i, s, s’ such

Page 94: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

82

that (i, s) 𝒠2 𝑥, 𝑘 and (i, s’) 𝒠2 𝑥′, 𝑘 . If x < x’, 𝒠2 𝑥, 𝑘 = {⋯ , (𝜋 𝑥, 𝑥′ , 1 + 𝑟𝑥𝑥 ′ ), ⋯ }

and 𝒠2 𝑥′, 𝑘 = {⋯ , (𝜋 𝑥, 𝑥′ , 𝑟𝑥𝑥 ′ ), ⋯ }, thus (1 + rxx’) – rxx’ = 1, hence it is correct for the

comparison algorithm to return <; if x > x’, 𝒠2 𝑥, 𝑘 = {⋯ , (𝜋 𝑥′, 𝑥 , 𝑟𝑥 ′ 𝑥), ⋯ } and 𝒠2 𝑥′, 𝑘 =

{⋯ , (𝜋 𝑥′ , 𝑥 , 1 + 𝑟𝑥 ′ 𝑥), ⋯ }, thus rxx’ −‎(1‎+‎rxx’)‎=‎−1‎=‎2,‎hence‎it‎is‎correct‎for‎the‎comparison‎

algorithm to return >.

Theorem 5.4.2: 𝒮𝒠2 is secure under IND-OCPA. Specifically, 𝐀𝐃𝐕𝒮𝒠2 ,𝒜IND −OCPA = 0.

Proof. Assume that the adversary queries {(𝑥𝑢0 , 𝑥𝑢

1 )‎ |‎ 1‎ ≤‎ u‎ ≤‎ h}‎ under‎ IND-OCPA.

According to the restriction under IND-OCPA, 𝑥𝑢0 = 𝑥𝑣

0 𝑥𝑢1 = 𝑥𝑣

1. Since it will not increase the

advantage by querying two identical plaintexts pairs, it suffices to consider 𝑥10 < 𝑥2

0 < ... < 𝑥𝑕0 and

𝑥11 < 𝑥2

1 < ... < 𝑥𝑕1 . Hence, the adversary views (𝒠2 𝑥1

0, 𝑘 , ⋯ , 𝒠2 𝑥𝑕0, 𝑘 for b = 0, and the

adversary views (𝒠2 𝑥11, 𝑘 , ⋯ , 𝒠2 𝑥𝑕

1 , 𝑘 for b = 1. It suffices to prove that the above two

probability distributions are identical because it implies that 𝐀𝐃𝐕𝒮𝒠2 ,𝒜IND −OCPA = 0.

We use mathematical induction on h to prove that the two probability distributions

(𝒠2 𝑥10, 𝑘 , ⋯ , 𝒠2 𝑥𝑕

0, 𝑘 ) and (𝒠2 𝑥11, 𝑘 , ⋯ , 𝒠2 𝑥𝑕

1 , 𝑘 ) are identical. For h = 1, it is necessary to

show that the probability distribution 𝒠2 𝑥10, 𝑘 equals to the probability distribution 𝒠2 𝑥1

1, 𝑘 .

Let Π = {(x, x’)‎ |‎1‎≤‎x < x’ ≤‎m}. Let Ij,‎1‎≤‎ j ≤‎m−1,‎be‎the‎probability‎distribution‎such‎that‎

Pr[I1 = i1, ..., Im−1 = im−1}] = 1

( Π −𝑗 )0≤𝑗≤𝑚 −2 for (i1, ..., im−1) Π𝑚−1 and ij ≠‎ij’ if j ≠‎j’. Let Sj,‎1‎≤‎

j ≤‎m−1,‎be‎the‎uniform‎distribution‎on‎Z3. Then according to the construction of 𝒠2,

𝒠2 𝑥10, 𝑘 = 𝐼𝑗 , 𝑆𝑗 1 ≤ 𝑗 ≤ 𝑚 − 1 = 𝒠2 𝑥1

1, 𝑘

Page 95: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

83

We assume that the two probability distributions are identical for h < h’. For h = h’, we

consider the following two conditional probability distributions

𝑋 = 𝒠2 𝑥𝑕 ′0 , 𝑘 |𝒠2 𝑥1

0, 𝑘 = 𝑦1, ⋯ , 𝒠2 𝑥𝑕 ′ −10 , 𝑘 = 𝑦𝑕 ′ −1

and

𝑌 = 𝒠2 𝑥𝑕 ′1 , 𝑘 |𝒠2 𝑥1

1, 𝑘 = 𝑦1, ⋯ , 𝒠2 𝑥𝑕 ′ −11 , 𝑘 = 𝑦𝑕 ′ −1

where yu = {(𝑖𝑢𝑗, 𝑠𝑢

𝑗) Π Z3 |‎1‎≤‎ j ≤‎m−1},‎1‎≤‎u ≤‎h’−1.‎y1, ..., yh’−1‎will‎affect‎𝒠2 𝑥𝑕 ′

0 , 𝑘

(𝒠2 𝑥𝑕 ′1 , 𝑘 ).‎ First,‎ for‎ 1‎ ≤‎ u ≤‎h’−1,‎ there‎ exists‎ unique‎𝑖𝑢

𝑗 (for some j) appears in 𝒠2 𝑥𝑕 ′

0 , 𝑘

(𝒠2 𝑥𝑕 ′1 , 𝑘 ) according to the construction of 𝒠2. On the other hand, there exists unique 𝑖𝑢

𝑗′ (for

some j’) appears in yu’,‎ 1‎ ≤‎ u’ ≠‎ u ≤‎ h’−1;‎ hence‎ those‎ 𝑖𝑢𝑗′

will not appear in 𝒠2 𝑥𝑕 ′0 , 𝑘

(𝒠2 𝑥𝑕 ′1 , 𝑘 ). Thus, let

Π𝑢 = {𝑖𝑢𝑗 | 𝑖𝑢

𝑗 appears in yu but does not appear in yu’ for‎any‎1‎≤‎u’ ≠‎u ≤‎h’−1},

for‎ 1‎ ≤‎ u ≤‎ h’−1.‎ Then‎ there‎ exists‎ 𝑖𝑢𝑗

∈ Π𝑢 such that ( 𝑖𝑢𝑗

, 𝑠𝑢𝑗− 1 ) appears in 𝒠2 𝑥𝑕 ′

0 , 𝑘

(𝒠2 𝑥𝑕 ′1 , 𝑘 ),‎1‎≤‎u ≤‎h’−1.‎Note‎ that‎ the‎elements‎of‎a‎set‎do‎not‎have‎orders,‎without‎ loss‎of‎

generality,‎for‎1‎≤‎u ≤‎h’−1,‎let‎(Iu, Su) be the probability distribution such that Pr[(Iu, Su) = (𝑖𝑢𝑗,

𝑠𝑢𝑗− 1)] =

1

Π𝑢 . The rest probability distributions are similar to those for the situation of h = 1.

Let

Π = {(x, x’) Π | (x, x’) does not appear in yu,‎1‎≤‎u ≤‎h’−1}

Page 96: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

84

for h’ ≤‎u ≤‎m−1,‎ let‎ Iu be the probability distribution such that Pr[Ih’ = ih’, ..., Im−1 = im−1] =

1

( Π −𝑗 )0≤𝑗≤𝑚−𝑕′ −1

for (ih’, ..., im−1) Π𝑚−𝑕 ′

and iu ≠‎ iu’ if u ≠‎u’. Let Su, h’ ≤‎u ≤‎m−1,‎be‎ the‎

uniform distribution on Z3. Then

X = {(Iu, Su)‎|‎1‎≤‎u ≤‎m−1}‎=‎Y. (5.1)

Consequently,

Pr[𝒠2 𝑥10, 𝑘 = y1, ..., 𝒠2 𝑥𝑕 ′

0 , 𝑘 = yh’]

= Pr[X = yh’ | 𝒠2 𝑥10, 𝑘 = y1, ..., 𝒠2 𝑥𝑕 ′

0 , 𝑘 = yh’]‎∙‎Pr[X = yh’]

= Pr[Y = yh’ | 𝒠2 𝑥11, 𝑘 = y1, ..., 𝒠2 𝑥𝑕 ′

1 , 𝑘 = yh’]‎∙‎Pr[Y = yh’] (induction hypothesis and (5.1))

= Pr[𝒠2 𝑥11, 𝑘 = y1, ..., 𝒠2 𝑥𝑕 ′

1 , 𝑘 =‎yh’].

Hence it implies that the two probability distributions are identical for h = h’, which

completes induction.

Remark 5.4.1: In order to improve the efficiency of 𝒮𝒠2, and −1

can be substituted with

deterministic symmetric-key encryption and decryption algorithms, and rxx’ can be generated by

a pseudorandom number generator. It is obvious that the improved scheme remains secure under

IND-OCPA.

5.4.2 IND-OLCPA

Now we consider security notion for OPE schemes in superpolynomial-sized domains.

According to the big jump attack, if m is a superpolynomial of λ, then the adversary 𝒜𝐵𝐽 can

have 𝐀𝐃𝐕𝒮𝒠,𝒜𝐵𝐽

IND −OCPA ≥ 1 − 𝜈(𝜆) by using three oracle queries. From this we can conclude that

Page 97: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

85

IND-OCPA is too strong a security notion for OPE schemes in the superpolynomial-sized

domain. Thus, we further weaken IND-OCPA and define the security notion IND-OLCPA

(indistinguishability under ordered and local chosen-plaintext attack), where the range of the

oracle queries is bounded by a polynomial of λ (to prevent the adversary from launching the big

jump attack). The definition of IND-OLCPA is given as follows.

Definition 5.4.2 (IND-OLCPA): The security notion IND-OLCPA has the same definition

as that of IND-CPA except that the adversary is restricted so that it can only query {(𝑥𝑢0, 𝑥𝑢

1) |‎1‎≤‎

u ≤‎h} where

𝑥𝑢0 < 𝑥𝑣

0 𝑥𝑢1 < 𝑥𝑣

1 (5.2)

for 1 ≤ u, v ≤ h, and there exists a polynomial g1 such that

|𝑥𝑢𝑖 − 𝑥𝑣

𝑗| ≤ g1(λ) (5.3)

for 1 ≤ u, v ≤ h and 0 ≤ i, j ≤ 1.

We design the following attack (and call it the small jump attack) against OPE schemes

under IND-OLCPA. Similar to the big jump attack, the small jump attack also decides whether

the ciphertexts are encrypted from the left or the right plaintexts based on the differences in

distances between the ciphertexts.

Definition 5.4.3 (Small jump attack): Consider the following PPT adversary 𝒜𝑆𝐽 with

three oracle queries in the experiment of security notion IND-OLCPA.

Adversary 𝓐𝑆𝐽𝒠𝑘(ℒℛ(∙,∙,𝑏))

x $ {1, ..., m−3}

y1 𝒠𝑘(ℒℛ(𝑥, 𝑥, 𝑏))

Page 98: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

86

y2 𝒠𝑘(ℒℛ(𝑥 + 1, 𝑥 + 2, 𝑏))

y3 𝒠𝑘(ℒℛ(𝑥 + 3, 𝑥 + 3, 𝑏))

Return 1 if y3 − y2 < y2 − y1; else return 0.

In the small jump attack given above, the left plaintexts are x, x+1, and x+3, and the

corresponding right plaintexts are x, x+2, and x+3, where x is randomly selected from {1, ...,

m−3}. The following lemma shows that the small jump attack can distinguish these two cases

with non-negligible probability.

Lemma 5.4.3: There is no efficient OPE scheme that is secure under IND-OLCPA

(because of 𝒜𝑆𝐽 ) if m is a superpolynomial of λ. Specifically, there exists a polynomial g such

that 𝐀𝐃𝐕𝒮𝒠,𝒜𝑆𝐽

IND −OLCPA ≥1

𝑔(𝜆)∙𝑔1(𝜆).

Proof. Let di = 𝒠 𝑖 + 1, 𝑘 − 𝒠 𝑖, 𝑘 be the distance of the two ciphertexts, 1 ≤ i < m.

Suppose that the adversary selects x = i in the small jump attack. Then y3 − y2 = di+1 + di+2 and y2

− y1 = di if b = 0; y3 − y2 = di+2 and y2 − y1 = di + di+1 if b = 1. Therefore adversary 𝒜𝑆𝐽 returns

correct b if the following condition holds.

di + di+1 > di+2 and di < di+1 + di+2 (5.4)

Consequently, adversary 𝒜𝑆𝐽 may return incorrect b if either of the following two

conditions (called small jump and small reverse-jump) holds.

di + di+1 ≤ di+2 (5.5)

di ≥ di+1 + di+2 (5.6)

Page 99: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

87

Note that condition (5.5) implies that the distance series increases faster than Fibonacci

numbers, and condition (5.6) implies that the reversed distance series increases faster than

Fibonacci numbers. Since the formula of Fibonacci Numbers is

1

5

1 + 5

2

𝑖

− 1 − 5

2

𝑖

and logdi must be bounded by a polynomial, it implies that condition (5.5) (resp. condition (5.6))

cannot consecutively happen superpolynomial times. Moreover, condition (5.6) cannot happen

consecutively after condition (5.5). Otherwise

di + di+1 ≤ di+2 and di+1 ≥ di+2 + di+3 di + di+1 + di+2 + di+3 ≤ di+2 + di+1 di + di+3 ≤ 0,

which causes contradiction.

Consider {(di, di+1, di+2) |‎1‎≤‎i ≤‎m−3}. Suppose that (di, di+1, di+2) satisfies condition (5.5)

or condition (5.6), and m−3−i is a superpolynomial. Since condition (5.5) (resp. condition (5.6))

cannot consecutively happen superpolynomial times and condition (5.6) cannot happen

consecutively after condition (5.5), there must exist polynomial gi such that

(𝑑𝑖+𝑔𝑖, 𝑑𝑖+1+𝑔𝑖

, 𝑑𝑖+2+𝑔𝑖) satisfies condition (5.4). Hence the points in the set

{i | (di, di+1, di+2) satisfy condition (5.4)}

partition [m] into polynomial-sized segments. Let g∙g1 be the maximum polynomial. Then there

are at least 𝑚−3

𝑔∙𝑔1 many i’s such that (di, di+1, di+2) satisfies condition (5.4). Since adversary 𝒜𝑆𝐽

returns correct b if it selects x = i and (di, di+1, di+2) satisfies condition (5.4),

𝐀𝐃𝐕𝒮𝒠,𝒜𝑆𝐽

IND −OLCPA ≥1

𝑚−3∙

𝑚−3

𝑔(𝜆)∙𝑔1(𝜆)=

1

𝑔(𝜆)∙𝑔1(𝜆)

Page 100: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

88

Proposition 5.4.4: If the adversary repeats the small jump attack, then the lower bound on

the advantage of the adversary will become 1/g.

Proof. Since the range of plaintexts in the oracle queries is bounded by g1, the probability

for some i in the set {i | (di, di+1, di+2) satisfy condition (5.4)} in the proof of Lemma 5.4.3 will

fall into the range is at least 𝑔1

𝑚−3∙

𝑚−3

𝑔∙𝑔1 = 1/g. Therefore the lower bound on the advantage of the

adversary will increase to 1/g.

5.4.3 Ideal OPE and GOPE in the Superpolynomial-sized Domain

According to the same adversaries 𝒜 in Proposition 5.3.4, the ideal OPE object 𝒮𝒠∗ does

not achieve the lower bound on the advantage of the adversary 1/g (Proposition 5.4.4) under

IND-OLCPA in superpolynomial-sized domains. Next we design a GOPE scheme 𝒮𝒠3 =

(𝒦3, 𝒠3, 𝒟3, 𝒞3) in the superpolynomial-sized domain, and prove that it achieves that lower

bound. 𝒮𝒠3 is constructed based on two building blocks: 𝒮𝒠4 and 𝒮𝒠5. 𝒮𝒠4 is adapted from 𝒮𝒠1

and it is secure under IND-OLCPA; but it can only support “local” comparisons (i.e.

comparisons for ciphertexts whose plaintexts are closeby). The ciphertexts of 𝒮𝒠5 have proper

remote order to support “remote” comparisons (i.e. comparisons for ciphertexts whose plaintexts

are far apart).

First we design 𝒮𝒠4 = (𝒦4, 𝒠4, 𝒟4, 𝒞4). Let g2 denote a polynomial. In 𝒮𝒠4 the ciphertext

of x can be compared with g2(λ) − 1 (instead of m − 1) other ciphertexts whose plaintexts are

close to x.

Page 101: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

89

- 𝒦4: Given the domain size m, it randomly picks a permutation of the set {(x, x’) | 1 ≤ x

< x’ ≤ m}, and randomly generates rxx’ Z3 for 1 ≤ x < x’ ≤ m. It returns {(, rxx’) | 1 ≤ x < x’ ≤

m};

- 𝒠4: For plaintext x, it returns the ciphertext

y = {((x’, x), rx’x) | x’ < x}∪{((x, x’), 1 + rxx’) | x < x’ ≤ g2(λ)} if x ≤ g2(λ)/2;

y = {((x’, x), rx’x) | x−g2(λ)/2<x’<x}∪{((x, x’), 1 + rxx’}) | x<x’≤x+g2(λ)/2} if g2(λ)/2 < x <

m − g2(λ)/2;

y = {((x’, x), rx’x) | m − g2(λ) ≤ x’ < x}∪{((x, x’), 1 + rxx’) | x < x’} if x ≥ m − g2(λ)/2;

- 𝒟4: For ciphertext y, it retrieves (any) two elements (i, s) and (i’, s’) from the set y, and

returns plaintext x which appears in both −1(i) and −1

(i’);

- 𝒞4: For ciphertexts y and y’, if y = y’, it returns =. Otherwise, it retrieves (i, s) from the set

y and (i, s’) from the set y’. If s – s’ = 1, it returns <. If s – s’ = 2, it returns >.

The correctness, security, and efficiency of 𝒮𝒠4 are presented in Lemmas 5.4.5 and 5.4.6.

Lemma 5.4.5: The decryption of 𝒮𝒠4 is correct. Also for plaintexts x1, x2 [m], the

comparison of 𝒠4(𝑥1, 𝑘) and 𝒠4(𝑥2, 𝑘) is correct if |x1 − x2| ≤ (g2(λ)‎− 1)/2.

Proof. The correctness of the decryption can be easily verified. Note that ciphertext of x1

(resp. x2) can compare with other g2(λ) − 1 ciphertexts whose plaintexts are close to x1 (resp. x2).

Hence 𝒠4(𝑥1, 𝑘) and 𝒠4(𝑥2, 𝑘) are comparable if |x1 − x2| ≤ (g2(λ) − 1)/2. The comparison is

correct referring to the proof of Lemma 5.4.1.

Page 102: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

90

Lemma 5.4.6: Suppose that the range of oracle queries under IND-OLCPA is bounded by

polynomial g1. Then 𝒮𝒠4 is secure under IND-OLCPA if g2 ≥ 2g1 + 1. Furthermore, 𝒮𝒠4 can be

revised to achieve efficiency and remain secure under IND-OLCPA.

Proof. The security proof is analogous to that of Theorem 5.4.2. It is worthy to note that

the condition g2 ≥ 2g1 + 1 will be used in the inductive step to guarantee two conditional

probability distributions are identical. The detailed proof is presented as follows.

Assume that the adversary queries {(𝑥𝑢0, 𝑥𝑢

1) |‎1‎≤‎u ≤‎h} under IND-OLCPA. According to

the restriction condition (5.2) under IND-OLCPA, 𝑥𝑢0 = 𝑥𝑣

0 𝑥𝑢1 = 𝑥𝑣

1 . Since it will not

increase the advantage by querying two identical plaintexts pairs, it suffices to consider 𝑥10 < 𝑥2

0

< ... < 𝑥𝑕0 and 𝑥1

1 < 𝑥21 < ... < 𝑥𝑕

1. Hence, the adversary views (𝒠4 𝑥10, 𝑘 , ⋯ , 𝒠4 𝑥𝑕

0, 𝑘 ) for b = 0,

and the adversary views (𝒠4 𝑥11, 𝑘 , ⋯ , 𝒠4 𝑥𝑕

1 , 𝑘 ) otherwise. It suffices to prove that the above

two probability distributions are identical because it implies that 𝐀𝐃𝐕𝒮𝒠4 ,𝒜IND −OLCPA = 0.

We use mathematical induction on h to prove that the two probability distributions

(𝒠4 𝑥10, 𝑘 , ⋯ , 𝒠4 𝑥𝑕

0, 𝑘 ) and (𝒠4 𝑥11, 𝑘 , ⋯ , 𝒠4 𝑥𝑕

1 , 𝑘 ) are identical. For h = 1, it is necessary to

show that the probability distribution 𝒠4 𝑥10, 𝑘 equals to the probability distribution 𝒠4 𝑥1

1, 𝑘 .

Let Π = {(x, x’) | 1 ≤ x < x’ ≤ m}. Let Ij, 1 ≤ j ≤ g2(λ) − 1, be the probability distribution such that

Pr[I1 = i1, ..., 𝐼𝑔2 𝜆 −1 = 𝑖𝑔2 𝜆 −1] = 1

( Π −𝑗 )0≤𝑗≤𝑔2 𝜆 −2 for (i1, ..., 𝑖𝑔2 𝜆 −1) Πg2 λ −1 and ij ≠ ij’ if j

≠‎ j’. Let Sj, 1 ≤ j ≤ g2(λ) − 1, be the uniform distribution on Z3. Then according to the

construction of 𝒠4,

𝒠4 𝑥10, 𝑘 = {(Ij, Sj) | 1 ≤ j ≤ g2(λ) − 1} = 𝒠4 𝑥1

1, 𝑘 .

Page 103: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

91

We assume that the two probability distributions are identical for h < h’. For h = h’, we

consider the following two conditional probability distributions

X = 𝒠4 𝑥𝑕 ′0 , 𝑘 | 𝒠4 𝑥1

0, 𝑘 = 𝑦1, ⋯ , 𝒠4 𝑥𝑕 ′ −10 , 𝑘 = 𝑦𝑕 ′ −1

and

Y = 𝒠4 𝑥𝑕 ′1 , 𝑘 | 𝒠4 𝑥1

1, 𝑘 = 𝑦1, ⋯ , 𝒠4 𝑥𝑕 ′ −11 , 𝑘 = 𝑦𝑕 ′ −1

where yu = {(𝑖𝑢𝑗, 𝑠𝑢

𝑗) Π Z3 | 1 ≤ j ≤ m−1}, 1 ≤ u ≤ h’−1. For oracle queries 𝑥𝑢

𝑖 and 𝑥𝑣𝑗, since |𝑥𝑢

𝑖

− 𝑥𝑣𝑗| ≤‎g1(λ) ≤ (g2(λ) − 1)/2, they are comparable according to Lemma 5.4.5. So y1, ..., 𝑦𝑕 ′ −1 will

affect 𝒠4 𝑥𝑕 ′0 , 𝑘 (𝒠4 𝑥𝑕 ′

1 , 𝑘 ). First, for 1 ≤ u ≤ h’ − 1, there exists unique 𝑖𝑢𝑗 (for some j) appears

in 𝒠4 𝑥𝑕 ′0 , 𝑘 (𝒠4 𝑥𝑕 ′

1 , 𝑘 ). On the other hand, there exists unique 𝑖𝑢𝑗 ′

(for some j’) appears in yu’,

1 ≤ u’ ≠ u ≤ h’ − 1; hence those 𝑖𝑢𝑗 ′

will not appear in 𝒠4 𝑥𝑕 ′0 , 𝑘 (𝒠4 𝑥𝑕 ′

1 , 𝑘 ). Thus, let

Π𝑢 = {𝑖𝑢𝑗 | 𝑖𝑢

𝑗 appears in yu but does not appear in yu’ for any 1 ≤ u’ ≠ u ≤ h’ − 1},

for 1 ≤ u ≤‎ h’ − 1. Then there exists 𝑖𝑢𝑗

∈ Π𝑢 such that (𝑖𝑢𝑗

, 𝑠𝑢𝑗− 1) appears in 𝒠4 𝑥𝑕 ′

0 , 𝑘

(𝒠4 𝑥𝑕 ′1 , 𝑘 ), 1 ≤ u ≤ h’ − 1. Note that the elements of a set do not have orders, without loss of

generality, for 1 ≤ u ≤ h’ − 1, let (Iu, Su) be the probability distribution such that Pr[(Iu, Su) = (𝑖𝑢𝑗,

𝑠𝑢𝑗− 1)] =

1

Π𝑢 . The rest probability distributions are similar to those for the situation of h = 1.

Let

Π = {(x, x’) Π | (x, x’) does not appear in yu, 1 ≤ u ≤ h’ − 1}.

Page 104: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

92

For h’ ≤ u ≤ g2(λ) − 1, let Iu be the probability distribution such that Pr[Ih’ = ih’, ..., 𝐼𝑔2 𝜆 −1

= 𝑖𝑔2 𝜆 −1] = = 1

( Π −𝑗 )0≤𝑗≤𝑔2 𝜆 −𝑕′ −1

for (ih’, ..., 𝑖𝑔2 𝜆 −1) Π𝑔2 𝜆 −𝑕 ′

and iu ≠ iu’ if u ≠ u’. Let Su,

h’ ≤ u ≤ g2(λ) − 1, be the uniform distribution on Z3. Then

X = {(Iu, Su) | 1 ≤ u ≤ g2(λ) − 1} = Y. (5.7)

Consequently,

Pr[𝒠4 𝑥10, 𝑘 = y1, ..., 𝒠4 𝑥𝑕 ′

0 , 𝑘 = yh’]

= Pr[X = yh’ | 𝒠4 𝑥10, 𝑘 = y1, ..., 𝒠4 𝑥𝑕 ′

0 , 𝑘 = yh’] ∙ Pr[X = yh’]

= Pr[Y = yh’ | 𝒠4 𝑥11, 𝑘 = y1, ..., 𝒠4 𝑥𝑕 ′

1 , 𝑘 = yh’] ∙ Pr[Y = yh’] (induction hypothesis and (5.7))

= Pr[𝒠4 𝑥11, 𝑘 = y1, ..., 𝒠4 𝑥𝑕 ′

1 , 𝑘 = yh’].

Hence it implies that the two probability distributions are identical for h = h’, which

completes induction.

To achieve efficiency of 𝒮𝒠4, and −1 can be substituted with deterministic symmetric-

key encryption and decryption algorithms, and rxx’ can be generated by a pseudorandom number

generator. It is obvious that the revision is efficient and remains secure under IND-OLCPA.

Note that the original 𝒮𝒠4 is given because it is easier to understand its GOPE

construction. It is revised to achieve better efficiency. For convenience, from this point onwards,

𝒮𝒠4 refers to the revised version. Next we design 𝒮𝒠5 = (𝒦5, 𝒠5, 𝒞5) . Since 𝒮𝒠4 supports

decryption and “local” comparisons, 𝒮𝒠5 does not need a decryption algorithm but should

support “remote” comparisons. In order to assure security, the ciphertexts should have small

statistical distances if the corresponding plaintexts are close to each other. To achieve this, for 1

Page 105: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

93

≤ i ≤ l, 𝒠5(𝑖, 𝑘) are randomly selected from [n’], where n’ and l are positive integers. Then the

subsequent ciphertexts are gradually increased. The construction of 𝒮𝒠5 is shown as follows.

- 𝒦5: It randomly selects ri [n’] for 0 ≤ i ≤ l − 1 and returns (r0, ..., rl−1);

- 𝒠5: For plaintext x [m], we compute a and b, a ≥ 0 and 0 ≤ b < l, such that x − 1 = a ∙ l

+ b. 𝒠5 returns ciphertext y = rb + a;

- 𝒞5: For ciphertexts y and y’, if y > y’, it returns >; if y < y’, it returns <.

The correctness of 𝒮𝒠5 is presented in Lemma 5.4.7.

Lemma 5.4.7: For plaintexts x1, x2 [m], the comparison of 𝒠5(𝑥1, 𝑘) and 𝒠5(𝑥2, 𝑘) is

correct if |x1 − x2| ≥ n’∙l + l.

Proof. Without loss of generality, we assume that x1 < x2. Then x2 − x1 ≥ n’∙l + l. Let xi − 1

= ai ∙ l + bi satisfying ai ≥ 0 and 0 ≤ bi < l, 1 ≤ i ≤ 2. Then

n’ ∙‎l + l ≤ x2 − x1 = (a2 − a1) ∙ l + (b2 − b1) < (a2 − a1) ∙ l + l a2 − a1 > n’.

Hence,

𝒠5(𝑥1, 𝑘) = 𝑟𝑏1 + a1 < 𝑟𝑏1

+ (a2 – n’) = (𝑟𝑏1 – n’) + a2 < 𝑟𝑏2

+ a2 = 𝒠5(𝑥2, 𝑘),

which implies that the comparison is correct.

If the queries by the adversary against 𝒮𝒠5 are in the interval [c∙l + 1, (c+1)∙l], for some c ≥

0, then the adversary cannot distinguish the corresponding ciphertexts because they are

independent identical random variables generated by 𝒠5. If the queries involve plaintexts in two

consecutive intervals [c∙l + 1, (c+1)∙l] and [(c+1)∙l + 1, (c+2)∙l], then the advantage of the

adversary is not 0, but it can be controlled by l and n’. The security of 𝒮𝒠5 is given in the

following Lemma.

Page 106: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

94

Lemma 5.4.8: Suppose that the range of oracle queries under IND-OLCPA is bounded by

polynomial g1. For polynomial g ≥ 1, 𝐀𝐃𝐕𝒮𝒠5 ,𝒜IND −OLCPA ≤

1

𝑔(𝜆) if l > g1(λ) and n’ ≥ g(λ) ∙ g1(λ).

Proof. Assume that the adversary queries {(𝑥𝑢0 , 𝑥𝑢

1 ) |‎ 1‎ ≤‎ u ≤‎ h} under IND-OLCPA.

According to the restriction condition (5.2) under IND-OLCPA, 𝑥𝑢0 = 𝑥𝑣

0 𝑥𝑢1 = 𝑥𝑣

1. Since it

will not increase the advantage by querying two identical plaintexts pairs, it suffices to consider

𝑥10 < 𝑥2

0 < ... < 𝑥𝑕0 and 𝑥1

1 < 𝑥21 < ... < 𝑥𝑕

1. Let 𝑥𝑢𝑖 − 1 = 𝑎𝑢

𝑖 ∙ 𝑙 + 𝑏𝑢𝑖 satisfying 𝑎𝑢

𝑖 ≥ 0 and 0 ≤ 𝑏𝑢𝑖 <

l, then 𝒠5 𝑥𝑢𝑖 , 𝑘 = 𝑟𝑏𝑢

𝑖 + 𝑎𝑢𝑖 , 1 ≤ u ≤ h and 0 ≤ i ≤ 1. Hence, the adversary views (𝑟𝑏1

0 + 𝑎10, ...,

𝑟𝑏𝑕0 + 𝑎𝑕

0) for b = 0, and the adversary views (𝑟𝑏11 + 𝑎1

1, ..., 𝑟𝑏𝑕1 + 𝑎𝑕

1 ) otherwise. Let Δ be the

statistical distance between ( 𝑟𝑏10 + 𝑎1

0 , ..., 𝑟𝑏𝑕0 + 𝑎𝑕

0 ) and ( 𝑟𝑏11 + 𝑎1

1 , ..., 𝑟𝑏𝑕1 + 𝑎𝑕

1 ). Since

𝐀𝐃𝐕𝒮𝒠5 ,𝒜IND −OLCPA ≤ Δ, it suffices to prove that Δ ≤

1

g(λ).

We study the properties of those probability distributions. Since

|(𝑎𝑢𝑖 − 𝑎𝑣

𝑗) ∙ l + (𝑏𝑢

𝑖 − 𝑏𝑣𝑗)| = |𝑥𝑢

𝑖 − 𝑥𝑣𝑗| ≤ g1(λ) < l,

it implies that |𝑎𝑢𝑖 − 𝑎𝑣

𝑗| ≤ 1 and 𝑏𝑢

𝑖 = 𝑏𝑣𝑗 𝑎𝑢

𝑖 = 𝑎𝑣𝑗, 1 ≤ u, v ≤ h and 0 ≤ i, j ≤ 1. Furthermore

𝑏𝑢𝑖 = 𝑏𝑣

𝑗 𝑎𝑢

𝑖 = 𝑎𝑣𝑗 and 𝑥𝑢

0 ≠ 𝑥𝑣0 if u ≠ v implies that 𝑏𝑢

0 ≠ 𝑏𝑣0 if u ≠ v. Therefore 𝑟𝑏1

0 + 𝑎10, ...,

𝑟𝑏𝑕0 + 𝑎𝑕

0 are independent uniform distributions on [n’] + 𝑎10, ..., [n’] + 𝑎𝑕

0 . Similarly, 𝑟𝑏11 + 𝑎1

1,

..., 𝑟𝑏𝑕1 + 𝑎𝑕

1 are independent uniform distributions on [n’] + 𝑎11, ..., [n’] + 𝑎𝑕

1 . Hence Δ equals to

the statistical distance between independent uniform distributions X1, ..., Xh on [n’] + 𝑎10, ..., [n’]

+ 𝑎𝑕0 and independent uniform distributions Y1, ..., Yh on [n’] + 𝑎1

1, ..., [n’] + 𝑎𝑕1 , i.e.

Page 107: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

95

Δ =1

2 Pr 𝑋1, ⋯ , 𝑋𝑕 = 𝑤1, ⋯ , 𝑤𝑕

𝑤𝑢 ∈ 𝑛 ′ +𝑎𝑢0 ∪ 𝑛 ′ +𝑎𝑢

1 ,1≤𝑢≤𝑕

− Pr 𝑌1, ⋯ , 𝑌𝑕 = 𝑤1, ⋯ , 𝑤𝑕

Since |𝑎𝑢0 − 𝑎𝑢

1 | ≤ 1 for 1 ≤ u ≤ h, Δ ≤ 𝑕∙𝑛 ′ 𝑕−1

+𝑕∙𝑛 ′ 𝑕−1

2𝑛 ′ 𝑕 =

𝑕

𝑛 ′ ≤ 𝑔1(𝜆)

𝑛 ′ ≤‎1

𝑔(𝜆).

𝒮𝒠3 = (𝒦3, 𝒠3, 𝒟3, 𝒞3) is constructed by combining 𝒮𝒠4 and 𝒮𝒠5. In order to achieve full

comparison capability, g2, l, and n’ are chosen to satisfy the condition (g2 – 1)/2 ≥ n’∙l + l

(Lemmas 5.4.5 and 5.4.7). In order to achieve security, g2, l, and n’ are chosen to satisfy the

conditions g2 ≥ 2g1 + 1, l > g1, and n’ ≥ g ∙‎g1 (Lemmas 5.4.6 and 5.4.8). We can solve these

inequalities, and get l > g1, n’ ≥ g ∙ g1, and g2 ≥ max{2(n’∙l + l) + 1, 2g1 + 1} = 2(n’∙l + l) + 1.

Specifically, we can set l = g1 + 1, n’ = g ∙ g1, and g2 = 2(n’∙l + l) + 1. 𝒮𝒠3 encrypts plaintext x

into (𝒠4(𝑥, 𝑘), 𝒠5(𝑥, 𝑘)). Since g and g1 are polynomials, 𝒮𝒠3 is an efficient encryption scheme.

Given two ciphertexts (𝒠4(𝑥1, 𝑘) , 𝒠5(𝑥1, 𝑘) ) and (𝒠4(𝑥2, 𝑘) , 𝒠5(𝑥2, 𝑘) ), 𝒮𝒠3 first compares

𝒠4(𝑥1, 𝑘) and 𝒠4(𝑥2, 𝑘) by using 𝒞4 ; if it fails, 𝒮𝒠3 then compares 𝒠5(𝑥1, 𝑘) and 𝒠5(𝑥2, 𝑘) by

using 𝒞5. Also, 𝒠4(𝑥, 𝑘) can be decrypted by 𝒟4. We summarize these results in the following

theorem.

Theorem 5.4.9: Suppose that the range of oracle queries under IND-OLCPA is bounded

by polynomial g1. For any polynomial g ≥ 1, there exists an efficient GOPE scheme 𝒮𝒠3 such

that 𝐀𝐃𝐕𝒮𝒠3 ,𝒜IND −OLCPA ≤

1

𝑔(𝜆).

Proof. The proof is based on Lemmas 5.4.5 5.4.6 5.4.7 and 5.4.8.

Page 108: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

96

5.5 Overview of OPE to Multi-user Systems

5.5.1 Settings

As presented in Chapter 3, we consider the system which consists of a single server DB

hosting a database and a set of users U = {Uj | j 1} accessing the data stored on DB. A set of

key agents in KA manage the key and mediate the communication between the users and the DB.

For convenience, we assume that there are only numerical data in DB. Data of other types

can be represented by numerical data easily. For each critical data item x, the DB maintains two

ciphertexts COPE(x) and CCE(x). COPE(x) is encrypted using a specialized OPE scheme with a

master key k. Note that the existing OPE scheme cannot be used directly to support multi-user

systems and we develop a general approach to adapt any existing OPE scheme into a

corresponding digit based OPE (DOPE) scheme. The cipher COPE(x) of a data item x is

encrypted using DOPE.

CCE(x) is encrypted using a classical encryption scheme (e.g. AES). The purpose of storing

CCE(x) is to support efficient transmission of responses. For each data item x, a different data key

dkx is used to generate CCE(x). A user with access privilege to data item x will be granted key dkx.

In real implementation, the data items with the same access privileges can be grouped together

into an access domain and only one key is needed for each access domain.

5.5.2 Adjusted Definition of OPE

Note that in base-b number‎system,‎the‎digits‎are‎in‎the‎set‎{0,‎…,‎b−1}.‎To‎facilitate‎the‎

construction of DOPE, we adjust the plaintext domain and ciphertext range. Let the plaintext

Page 109: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

97

domain be {0,1}λ =‎{0,‎…,‎2

λ−1} (instead of [m] = {i |‎1‎≤‎i ≤‎m}) and the ciphertext range be

{0,1}μ =‎{0,‎…,‎2

μ−1} (instead of [n] = {j |‎1‎≤‎j ≤‎m}). The definition of OPE scheme is revised

accordingly as follows.

Definition 5.5.1 (Revised OPE Scheme): Let 𝒮𝒠λ,μ = (𝒦λ,μ

, 𝒠λ,μ, 𝒟λ,μ

) be an encryption

scheme, where 𝒦 λ,μ: {0,1}*{0,1}* is the key generation algorithm, 𝒦 λ,μ

:

{0,1}λ{0,1}*{0,1}

μ is the encryption algorithm, and 𝒟 λ,μ

: {0,1}μ {0,1}*{0,1}

λ is the

decryption algorithm. We say that 𝒮𝒠λ,μ is an OPE scheme if 𝒠λ,μ

satisfies‎the‎“order-preserving

property”:

x1 < x2 𝒠𝜆 ,𝜇 (𝑥1, 𝑘) < 𝒠𝜆 ,𝜇 (𝑥2, 𝑘)

for any x1,x2 {0,1}λ and key k.

Generally, the value of μ could impact the security of 𝒠λ,μ. But μ must be bounded by a

polynomial of λ to keep the efficiency of 𝒮𝒠λ,μ.

5.5.3 Problem Specification, Adversary Model, and Security Requirement

We construct a new OPE approach for multi-user systems. The approach includes a new

OPE construction that is tightly coupled with a request communication protocol Q and a

response communication protocol P.

In the request protocol Q, a user Ui issues a request (query) q to the DB, where q may

contain some secret data that needs to be transmitted with q to the DB. For simplicity, assume

that there is only one secret data item in q and let x denote that data item. Protocol Q should

transfer q to DB while ensuring the correct and secure computation of COPE(x) and CCE(x) in the

Page 110: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

98

request transmission process. (Note that Ui can encrypt x using dkx and obtain CCE(x). But since

Ui does not have the OPE master key k, it is not possible for Ui to compute COPE(x). Thus, a set

of key agents (KA) are introduced to perform the OPE encryption.)

In the response protocol P, the DB sends back the response r to the user. The response r

may include a set of encrypted data objects {CCE(y1), CCE(y2),‎…, CCE(yt)} and/or {COPE(y1),

COPE(y2),‎ …, COPE(yt)} (the protocol decides whether to send CCE(yi) or COPE(yi) or both).

Protocol P should ensure the secure delivery of r to Ui and that Ui can decrypt the information in

r to obtain the query results y1, y2,‎…,‎yt.

Protocols Q and P have certain security requirements. The system entities, users, DB, and

key agents, may collude to acquire additional information. We unify the possible collusions and

construct a passive adversary 𝒜 who tries to gain extra information by compromising some

entities in the system. We assume that the key agents and DB are better protected than the users.

Therefore we assume that the adversary cannot compromise all key agents simultaneously. Thus,

we assume the adversary structure

AS = {𝑈𝒜 ∪ 𝐾𝐴𝒜, 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ {𝐷𝐵} | 𝑈𝒜 ⊂ 𝑈, 𝐾𝐴𝒜 ⊂ 𝐾𝐴},

where 𝑈𝒜 is the set of compromised users and 𝐾𝐴𝒜 is the set of compromised key agents (note

that 𝑈𝒜 and 𝐾𝐴𝒜 could be empty). The system should ensure the security requirement

Pr[𝒜(View) = x] = ν(λ) is satisfied, where ν denotes a negligible function and View is the

instance event randomly selected from the event space of what the adversary 𝒜 can observe in

the system by compromising entities in AS. Let U(x) denote the set of users who can access the

critical data x. Assume that none of the users in U(x) are compromised by 𝒜 . The security

Page 111: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

99

requirement can be interpreted as: for critical data x, if 𝒜 does not compromise the users in U(x),

then the probability for 𝒜 to retrieve x based on the information gathered from the compromised

entities is negligible.

5.5.4 Our Approach

We design a simple and effective response protocol P to deliver the responses very

efficiently. We simply include CCE(y1), CCE(y2),‎… CCE(yt) in r. The user should have access

rights to y1, y2,‎…,‎yt and, hence, should have the encryption keys 𝑑𝑘𝑦1 ,𝑑𝑘𝑦2 , … , 𝑑𝑘𝑦𝑡 to decrypt

the data items in r. Consider the security of the system against adversary 𝒜 (assume that 𝒜 has

not compromised the users in 𝑈𝑦1,‎…,‎𝑈𝑦𝑡

, where 𝑈𝑦𝑗 is the set of users who can access yj). Since

the protocol only transfers CCE(y1), CCE(y2),‎ … CCE(yt), 𝒜 cannot get the encryption keys

𝑑𝑘𝑦1 ,𝑑𝑘𝑦2 , … , 𝑑𝑘𝑦𝑡 and cannot compromises y1, y2,‎ …,‎ yt. Note that the design of P is fully

discussed here and will not be discussed further.

The request communication protocol Q cannot avoid the OPE encryption and are more

complex. We design two protocols for Q: basic-DOPE and OE-DOPE, where OE-DOPE offers a

better security protection to the secret data x in request q during the communication process.

Basic-DOPE and OE-DOPE protocols are discussed in Sections ‎5.6 and ‎5.7, respectively. Both

basic-DOPE and OE-DOPE protocols can be used with any existing OPE scheme.

5.6 The Basic DOPE Protocol for Multi-user Systems

In the basic-DOPE protocol, we use p key agents, KA0,‎…,‎KAp−1. to encrypt confidential

data x into COPE(x). The critical data x is divided into p “digits”.‎The‎i-th‎“digit”‎is‎sent‎to‎KAi to

Page 112: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

100

be encrypted by the underlying OPE using a key ki,‎0‎≤‎u < p. The encrypted digits are sent to

DB and integrated into the ciphertext COPE(x).

The basic-DOPE and OE-DOPE protocols are coupled with the encryption algorithm

DOPE. In Subsection ‎5.6.1, we present the construction of the DOPE encryption algorithm. Then

we prove the correctness and analyze the security of the DOPE encryption scheme in

Subsections ‎5.6.2 and ‎5.6.3, respectively. The basic-DOPE protocol is introduced in

Subsection ‎5.6.4.

5.6.1 Construction of the DOPE Encryption Scheme

Figure 5.1. DOPE scheme 𝒮𝒠p

λ,μ(𝒦p

λ,μ, 𝒠p

λ,μ, 𝓓p

λ,μ).

We construct the DOPE encryption scheme 𝒮𝒠pλ,μ

= (𝒦pλ,μ

, 𝒠pλ,μ

, 𝒟pλ,μ

) based on OPE

scheme 𝒮𝒠 λ’,μ’, where λ = t∙λ’ and μ = t∙μ’, as follows. The key generation algorithm 𝒦 p

λ,μ

invokes 𝒦λ’,μ’ to generate the OPE key k including p subkeys kj, 0 j < p. The process of the

encryption algorithm 𝒠pλ,μ

include: (1) representing the plaintext as p “digits”‎in‎base‎2λ/p

number

system, (2) encrypting the p “digits”‎by‎𝒠λ’,μ’, and (3) integrate the encrypted p “digits”‎back‎to‎a‎

𝓚pλ,μ

: Invoke 𝓚λ,μ to generate the set of keys k = {k0,‎…,‎kp1}.

𝓔pλ,μ

: Input: plaintext x, x {0,1}λ.

Output: ciphertext COPE(x), COPE(x) {0,1}μ.

Let λ’ = λ/p and μ’ = μ/p.

Express x in base 2λ’

number system, i.e., x = 0j<p xj · (2λ’

)j, 0 xj < 2

λ’.

COPE(x) = 0j<p 𝒠λ’,μ’

(xj, kj) · (2λ’

)j.

𝓓pλ,μ

: Input: ciphertext COPE(x), COPE(x) {0,1}μ.

Output: plaintext x, x {0,1}λ.

Let λ’ = λ/t and μ’ = μ/t.

Express COPE(x) in base 2μ’

number system, i.e., COPE(x) = 0j<p yj · (2μ’

)j.

Compute xj = 𝓓λ’,μ’(yj, kj), 0 j < p and x = 0j<p xj · (2

λ’)j.

Page 113: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

101

single value ciphertext in base 2μ’

number system. Accordingly the decryption algorithm 𝒟pλ,μ

uses the inverse process of 𝒠pλ,μ

to decrypt the ciphertext. We describe the processes of 𝒮𝒠pλ,μ

in

Figure 5.1.

5.6.2 Construction of the DOPE Encryption Scheme

We analyze the correctness of 𝒮𝒠pλ,μ

in Proposition 5.6.1.

Proposition 5.6.1: 𝒮𝒠 pλ,μ

is correct, i.e. 𝓓pλ,μ

(𝒠 pλ,μ

(x, k), k) = x and 𝒠 pλ,μ

is an OPE

algorithm.

Proof. First we prove that 𝓓pλ,μ(𝒠p

λ,μ(x, k), k) = x. Let x = 0j<p xj · (2λ’)j and 𝒠pλ,μ(x, k) =

0j<p 𝒠λ’,μ’(xj, kj) · (2

μ’)j. Then according the process shown in Figure 5.1,

𝓓pλ,μ(𝒠p

λ,μ(x, k), k) = 0j<p 𝓓λ’,μ’(yj,kj) · (2

λ’)j

= 0j<p 𝓓λ’,μ’(𝒠λ’,μ’(xj, kj), kj) · (2

λ’)j = 0j<p xj · (2λ’)j = x.

Now we prove that 𝒠pλ,μ is an OPE algorithm. For x1 = 0j<p x1,j · (2

λ’)j and x2 = 0j<p x2,j ·

(2λ’)j, we consider three situations.

(i) x1 < x2. Then j0 s.t. x1,j = x2,j for j > j0 and x1,j < x2,j for j = j0. Therefore

𝒠λ’,μ’(x1,j, kj) = 𝒠λ’,μ’(x2,j, kj) for j > j0 and 𝒠λ’,μ’(x1,j, kj) < 𝒠λ’,μ’(x2,j, kj) for j = j0.

Hence 𝒠pλ,μ(x1, k) = 0j<p 𝒠

λ’,μ’(x1,j, kj) · (2μ’)j < 0j<p 𝒠

λ’,μ’(x2,j, kj) · (2μ’)j = 𝒠p

λ,μ(x2, k).

(ii) x1 = x2. It can be proved that 𝒠pλ,μ(x1, k) = 𝒠p

λ,μ(x2, k) analogously to (i).

(iii) x1 > x2. It can be proved that 𝒠pλ,μ(x1, k) > 𝒠p

λ,μ(x2, k) analogously to (i).

According to (i) (ii) (iii), 𝒠pλ,μ is an OPE algorithm.

Page 114: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

102

5.6.3 Security of 𝓢𝓔pλ,μ

According to the construction in Figure 5.1, 𝒠pλ,μ

(x, k) = 0j<p 𝒠λ’,μ’

(xj, kj) · (2λ’

)j. The

security of 𝒠pλ,μ

(x, k) can be reduced to the security of 𝒠λ’,μ’(xj, kj) where λ’ = λ/p, μ’ = μ/p, and 0

j < p. According to ‎[77], there exists OPE scheme 𝒮𝒠λ’,μ’ to achieve the one-wayness security

where (1) μ’ ≥‎3λ’ and (2) h (the number of plaintext ciphertext pairs known by the adversary)

are bounded by a polynomial of λ’. Hence the values of μ and p are critical to the security of

𝒮𝒠pλ,μ

. We set μ ≥‎3λ to satisfy (1), and set p = O(λc) to satisfy (2), where 0 < c < 1 is a constant.

The one-wayness security of 𝒮𝒠pλ,μ

is proved Theorem 5.6.2.

Theorem 5.6.2: Assume that there is an OPE scheme 𝒮𝒠λ’,μ’ = (𝒦λ’,μ’

, 𝒠λ’,μ’, 𝓓λ’,μ’

) achieves

one-wayness security for μ’ ≥‎3λ’. Consider the DOPE scheme 𝒮𝒠pλ,μ

constructed based on 𝒮𝒠λ’,μ’

in Figure 5.1. Then 𝒮𝒠pλ,μ

also achieves the one-wayness security for μ ≥‎3λ and p = O(λc), 0 < c

< 1, even if the adversary knows a proper subset of keys in k. Specifically,

Pr[𝒜(𝒠pλ,μ

(x, k), KP, k’) = x] = ν(λ),

for μ ≥‎3λ, where KP = {(xi’, 𝒠pλ,μ

(xi’, k)) | 1 i h}, and k’ ⊂k = {k0,‎…,‎kp1}.

Proof. We reduce what the adversary view in the plaintext domain {0,1}λ and ciphertext

range {0,1}μ to {0,1}λ’ and {0,1}μ’, where λ’ = λ/p and μ’ = μ/p. Suppose that

x = 0j<p xj · (2λ’)j.

Then 𝒠pλ,μ(x, k) = 0j<p 𝒠

λ’,μ’(xj, kj) · (2μ’)j. It implies that the adversary knows

𝒠λ’,μ’(xj, kj)

for 0 j < p. Suppose that

xi’ = 0j<p xi,j’ · (2λ’)j

Page 115: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

103

for‎1‎≤‎i ≤‎h. Then 𝒠pλ,μ(xi’, k) = 0j<p 𝒠

λ’,μ’(xi,j’, kj) · (2μ’)j. It implies that the adversary knows

KPj = {(xi,j’, 𝒠λ’,μ’(xi,j’, kj))‎|‎1‎≤‎i ≤‎h}

for‎0‎≤‎j < p. Since kj are independently generated for 0 j < p and Pr[𝒜(𝒠λ’,μ’(xj, kj), KPj, k’) = xj]

= 1 for kj k’,

Pr[𝒜(𝒠λ’,μ’(x, k), KP, k’) = xi] = 0 j < p Pr[𝒜(𝒠λ’,μ’(xj, kj), KPj, k’) = xj]

= Pr[𝒜(𝒠𝜆 ′ ,𝜇 ′

(𝑥𝑗 , 𝑘𝑗 ), KP𝑗 ) = 𝑥𝑗 ]𝑘𝑗∉𝐾′ .

Since μ ≥‎3λ, μ’ = μ/p ≥‎3(λ/p) = 3λ’. Since h is bounded by a polynomial of λ and p = O(λc),

0 < c < 1, h is also bounded by a polynomial of λ’ = λ/p. Therefore

Pr[𝒜(𝒠λ’,μ’(xj, kj), KPj) = xi] = ν(λ’)

for kj k’. Since p = O(λc) for some constant 0 < c < 1, a negligible function of λ’ = λ/p is also a

negligible function of λ. Hence Pr[𝒜(𝒠pλ,μ

(x, k), KP, k’) = x] = ν(λ).

5.6.4 The Basic DOPE Communication Protocol

Let KAi,‎0‎≤‎i ≤‎p−1 be the p key agents. Without loss of generality, we assume that p =

O(1), i.e. there are a constant number of key agents. Let 𝒮𝒠λ’,μ’ = (𝒦λ’,μ’

, 𝒠λ’,μ’, 𝒟λ’,μ’

) be the

underlying OPE scheme, where μ ≥‎ 3λ, λ’ = λ/p and μ’ = μ/p. We assume that at the system

initialization time, some trusted party uses 𝒦λ,μ to generate k = (k0,‎…,‎kp1), and distribute ki to

KAi,‎0‎≤‎i ≤‎p−1.‎The‎basic-DOPE protocol realizes DOPE encryption scheme through KAi,‎0‎≤‎i

≤‎p−1‎and‎its‎pseudo‎code‎is‎presented‎in‎Figure‎5.2. Figure 5.3 shows the structure and message

flow of the basic-DOPE protocol.

Page 116: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

104

The efficiency, correctness (i.e., DOPE encryption result is the same as the ciphertext

COPE(x)), and security proof of the basic-DOPE protocol are given in Theorem 5.6.3.

Figure 5.2. The pseudo code for the basic-DOPE protocol.

Figure 5.3. The structure and message flow of the basic-DOPE protocol.

Theorem 5.6.3: The basic-DOPE protocol is efficient and correct, and achieves the one-

wayness security against the adversary structure AS = { 𝑈𝒜 ∪ 𝐾𝐴𝒜 , 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ {DB} |

𝑈𝒜 ⊂U, 𝐾𝐴𝒜 ⊂KA}.

Proof. Since λ’ = λ/p and μ’ = μ/p, the OPE algorithm 𝒠λ’μ’ is efficient. Also, the processes

to express x in base 2λ’ and combines the encryptions to COPE(x) are efficient. Therefore, the basic-

DOPE protocol is efficient. The basic-DOPE protocol is correct because the DB receives COPE(x)

= 0j<p yj · (2μ’)j

= 0j<p 𝒠λ’,μ’(xj, kj) · (2

μ’)j = 𝒠p

λ,μ(x, k).

For security, we assume that the adversary 𝒜 compromises DB and 𝑈𝒜 ⊂U, 𝐾𝐴𝒜 ⊂KA.

Then the adversary knows some plaintext ciphertext pairs in the set KP, where the plaintexts are

User

Ui

KAj (𝒠λ’μ’

,kj) DB xj yj

KA0 (𝒠λ’μ’

,k0)

KAp-1 (𝒠λ’μ’

,kp−1)

yp−1

y1 x1

xp−1

Let plaintext x {0,1}λ, λ’ = λ/p, μ’ = μ/p, and p = O(1).

(1) The user Ui express x in base 2λ’

, i.e., x = 0j<p xj · (2λ’

)j, 0 xj < 2

λ’.

Ui sends xj to KAj,‎0‎≤‎j ≤‎p−1.

(2) For 0‎≤‎j ≤‎p−1, KAj computes yj = 𝒠λ’,μ’(xj, kj) and sends yj to DB.

(3) DB combines COPE(x) = 0j<p yj · (2μ’

)j.

Page 117: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

105

from the users in 𝑈𝒜 and the ciphertexts are from DB. Also, the adversary 𝒜 can retrieve the

keys in k’, where k’ = {kj | KAj 𝐾𝐴𝒜}. Now consider a user Ui 𝑈𝒜 sends 𝒠pλ,μ

(x, k) to DB.

Then it is equivalent for the adversary to compromise 𝒠pλ,μ

(x, k) given KP and k’. Since μ ≥‎3λ

and p = O(1), according to Theorem 5.6.2, Pr[𝒜(𝒠pλ,μ

(x, k), KP, k’) = x] = ν(λ). Hence, the basic-

DOPE protocol achieves the one-wayness security against the adversary structure AS.

5.7 The OE-DOPE Protocol for Multi-user Systems

5.7.1 Security Issue in the Basic DOPE Protocol

Consider the following two attacks against the basic-DOPE protocol.

(1) The adversary 𝒜 compromises the key agent KAu. Then 𝒜 can‎view‎the‎“digit”‎xu of the

plaintext x in the process of the basic DOPE protocol.

(2) The adversary 𝒜 compromises DB and the key agent KAu for‎some‎0‎≤‎u < p. Then, for

any ciphertext COPE(x) = 0u<p 𝒠λ’,μ’(xu, ku) · (2

μ’)u stored on DB, 𝒜 can use the key ku

retrieved from KAu to compute xu = 𝒟λ’,μ(𝒠λ’,μ’(xu, ku), ku).

In both situations, the adversary 𝒜 retrieves the partial information xu of x, which implies

that λ’ = λ/p bits of the critical data item are leaked. However, since we assume that 𝒜 cannot

compromise all key agents, 𝒜 cannot retrieve the whole plaintext x. That is why the basic-DOPE

protocol can achieve one-wayness security. But revealing λ/p bits of some critical data items may

be unacceptable in many applications. Hence, it is desirable to enhance the security of the

request communication protocol.

Page 118: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

106

5.7.2 Oblivious Encryption

The attack in (2) is relatively easy to prevent. We substitute KAu by a chain of key agents

KAu,0,‎…,‎KAu,q−1. The key ku is also split into ku,0,‎…,‎ku,q−1 and distributed to KAu,0,‎…,‎KAu,q−1,

for all u. Critical data xu is encrypted through the chain KAu,0,‎…,‎KAu,q−1 by the OPE 𝒠𝜆0 ,𝜇0 ,‎…,‎

𝒠𝜆𝑞−1 ,𝜇𝑞−1 . The resulting ciphertexts, after encrypted by the chain of KAs, is order preserving

because the composition of OPEs is still an OPE. Now the adversary cannot retrieve xj from

COPE(x) unless it retrieves q keys ku,0,‎…,‎ku,q−1 by compromising the key agents KAu,0,‎…,‎KAu,q−1.

In principle, the attack in (1) can be prevented by secure computation, where the user has

the input xu and the key agent has the input ku. The user and the key agent can securely compute

the function 𝒠λ’,μ’, λ’ = λ/p and μ’ = μ/p, by any two party computation protocol. However, existing

two party computation protocols have high overhead. Therefore we develop the technique of OE

(oblivious encryption) to enable the key agent to encrypt xu without knowing the actual value of

xu (i.e., the probability for the key agent to know xu is negligible). In OE, xu is further expressed

in the base 2λ’’ number system, where λ’’= λ’/t and t = λ’c, 0 < c < 1. Let xu,0,‎…,‎xu,t−1 be the t

“micro-digits”‎of‎xu.‎Then,‎ in‎ the‎“micro-digit”‎domain‎{0,1}λ’’ of xu,v, the user sends a vector,

including xu,v and λ” – 1 random plaintexts to the key agent KAu,‎0‎≤‎v < t. KAu encrypts all of the

elements in the vector (KAu does not know which one is xu,v) and sends the encrypted vector to

the DB. At the same time, the user sends the location information lu,v of xu,v in the λ’’ random

plaintexts to the DB so that the DB can identify the encrypted xu,v and integrates them into

COPE(x). By further dividing the digits into t micro-digits, the probability for KAu to successfully

guess xu drops to 1/(λ”)t, which is a negligible function of λ.

Page 119: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

107

5.7.3 Vector Permutation and Data Mutation

The protocol above has a new security issue. If both KAu,0 (the first key agent in the chain)

and the DB are compromised, then the location information (sent from the user to DB) can be

used to identify xu,v in the λ’’ random plaintexts (sent from the user to KAu,0). Consequently, xu =

0v<t xu,v · (2λ’’)v can be derived. To cope with this attack, the key agent KAu,j permutes the vector

(original or encrypted) by a permutation πu,v,j (randomly generated by the user) before sending

them to the next key agent KAu,j+1. Thus, l’u,v = πu,v,q−1 ○‎…‎○‎πu,v,0(lu,v) (instead of lu,v) will be the

location information the user sends to the DB. However, using permutations alone cannot

guarantee the security because the encryption 𝒠𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯○ 𝒠𝜆0 ,𝜇0 preserves the order. Thus 𝒜

can still correctly link the λ’’ plaintexts (retrieved from KAu,0) to the λ’’ ciphertexts (retrieved

from DB) according to their orders and, hence restore the above attack. To prevent the adversary

from using orders to establish the links, each key agent KAu,j,‎ 1‎ ≤‎ j < q, will substitute half

elements in the vector (the set of the locations will be provided by the user) with new random

values to change the order of the ciphertext of xu,v in the vector. Consequently, the adversary

cannot use the location information and order information to indentify xu,v in the λ’’ random

plaintexts (sent from the user to KAu,0).

We now construct the OE-DOPE protocol. Let KA = {KAu,j |‎0‎≤‎u < p,‎0‎≤‎j < q}, which is

logically a KA grid of dimension p*q. We assume that there are a fixed number of key agents

and, hence, p, q = O(1). Let {0,1}λ be the plaintext domain, λ’ = λ/p, and λ” = λ’/t where t = (λ’)c

for some constant 0 < c <‎1.‎For‎0‎≤‎j < q, let 𝒮𝒠𝜆𝑗 ,𝜇 𝑗 = 𝒦𝜆𝑗 ,𝜇 𝑗 , 𝒠𝜆𝑗 ,𝜇 𝑗 , 𝒟𝜆𝑗 ,𝜇 𝑗 be the OPE schemes

satisfying λ0 = λ”, μj = λj+1 for‎0‎≤‎j < q−1, and μj = 3λj for‎0‎≤‎j < q. Therefore, 𝒮𝒠λ’’,μ’’ = (𝒦λ’’,μ’’,

Page 120: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

108

𝒠λ’’,μ’’, 𝒟λ’’,μ’’) is also an OPE scheme, where 𝒠𝜆′′ ,𝜇′′ = 𝒠𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝒠𝜆0 ,𝜇0 . Since μj = 3λj for‎0‎≤‎

j < q, we have μ’’ = 3q ∙ λ’’ > 3λ’’. Also since q is a constant, we have μ’’ = O(λ”) and, hence,

𝒮𝒠λ’’,μ’’ is an efficient OPE scheme. We assume that at the system initialization time, some trusted

party has used 𝒦𝜆𝑗 ,𝜇 𝑗 to generate the keys kj = (k0,j,‎…,‎kp1,j), and distributed ku,j to KAu,j,‎0‎≤‎u < p

and‎0‎≤‎j < q. The pseudo code of OE-DOPE protocol is shown in Figure 5.4. The structure and

message flow of the OE-DOPE protocol is illustrated in Figure 5.5.

Figure 5.4. The OE-DOPE protocol.

Let plaintext x {0,1}λ, λ’ = λ/p, and λ’’ = λ’/t where t = (λ’)

c for some constant 0 < c < 1.

The user Ui:

(1) Expresses x in base 2λ’

, i.e., x = 0u<p xu · (2λ’

)u, 0 xu < 2

λ’, further expresses xu

in base 2λ’’

, i.e., xu = 0v<t xu,v · (2λ’’

)v, 0 xu,v < 2

λ’’.

(2) For each xu,v, randomly selects λ’’ 1 distinct elements in {0,1}λ’’

. Let the elements

together with xu,v be 𝑤𝑢 ,𝑣,0 <‎…‎<‎𝑤𝑢 ,𝑣,𝑙𝑢 ,𝑣= xu,v <‎…‎<‎𝑤𝑢 ,𝑣,𝜆′′−1.

(3) Randomly generates the πu,v,j:‎{0,‎…,‎λ’’−1} {0,‎…,‎λ’’−1},‎0‎≤‎u < p, 0‎≤‎v < t,

and‎0‎≤‎j < q. (4) Randomly selects the set of locations Lu,v,j satisfying πu,v,j−1 ○‎…‎○‎πu,v,0(lu,v) Lu,v,j

{0,‎…,‎λ’’−1}, |Lu,v,j| = λ’’/2, 0‎≤‎u < p,‎0‎≤‎v < t,‎and‎1‎≤‎j < q.

(5) Sends Wu,v = (𝑤𝑢 ,𝑣,0, ..., 𝑤𝑢 ,𝑣,𝜆′′ −1) to KAu,0, , 0‎≤‎u < p and 0‎≤‎v < t

sends πu,v,0 to KAu,0 and (πu,v,j, Lu,j) to KAu,j, 0‎≤‎u < p,‎0‎≤‎v < t,‎and‎1‎≤‎j < q

sends l’u,v = πu,v,q−1○…○πu,v,0(lu,v) to DB,‎0‎≤‎u < p and‎0‎≤‎v < t.

The key agent KAu,v:

(1) Let W(0)

u,v = Wu,v.‎For‎0‎≤‎j < q−1, KAu,j encrypts every elements in W(j)

u,v by 𝒠𝜆𝑗 ,𝜇 𝑗

using the key ku,j except those at the locations in Lu,v,j, uses random values as the

encryption results for the elements at the locations in Lu,v,j, permutes the order of

the encryptions by πu,v,j, and sends the result W(j+1)

u,v to KAu,j+1.

(2) KAu,q−1 encrypts every elements in W(q-1)

u,v , permutes the order of the encryptions

by πu,v,q−1, and sends the result W(q)

u,v to DB.

The DB:

(1) Selects the l’u,v-th element in W(q)

u,v to compute COPE(x) = 0u<p (0v<t W(q)

u,v[l’u,v]

∙ 2μ’’)‎∙‎2

μ’.

Page 121: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

109

As shown in Figure 5.4, the user Ui expresses the plaintext x in base 2λ’ number system and

further‎expresses‎the‎“digit”‎xu in base 2λ’’ number‎system.‎For‎the‎“micro-digit”‎xu,v, the vector

Wu,v = (𝑤𝑢 ,𝑣,0, ..., 𝑤𝑢 ,𝑣,𝜆′′ −1) is created such that xu,v is in Wu,v at a random position lu,v. The user

also randomly generates the permutations πu,v,j and the set of location Lu,v,j, |Lu,v,j| = λ’’/2. (πu,v,j,

Lu,v,j) is sent to KAu,‎0‎≤‎u < p,‎0‎≤‎v < t,‎and‎0‎≤‎j < q, and l’u,v = πu,v,q−1 ○‎…‎○‎πu,v,0(lu,v) is sent to

DB,‎0‎≤‎u < p and‎0‎≤‎v < t. Then Wu,v is sent to KAu,0 so that the elements in Wu,v are encrypted,

subsituted, and permutated through KAu,0 to KAu,q−1. Finally, the DB identifies the encryptions of

xu,v according to l’u,v, and integrates them to get COPE(x). According to process, COPE(x) has the

following encryption structure: COPE(x) is the ciphertext encrypted by the DOPE scheme 𝒮𝒠pλ,μ.

𝒮𝒠pλ,μ is based on the underlying DOPE scheme 𝒮𝒠t

λ’,μ’. And 𝒮𝒠tλ’,μ’ is based on the underlying

OPE scheme 𝒮𝒠λ’’,μ’’, where 𝒠𝜆′′ ,𝜇′′ = 𝒠𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝒠𝜆0 ,𝜇0 .

Figure 5.5. Message Flow of the OE-DOPE protocol.

We now prove the efficiency, correctness (i.e., the ciphertext COPE(x) preserves the order of

the plaintexts), and the security of the OE-DOPE protocol in the following theorem.

Ci KAu,0

(𝒠𝜆0 ,𝜇0 ,ku,0) DB

KA0,0

(𝒠𝜆0 ,𝜇0 ,k0,0)

KAp-1,0

(𝒠𝜆0 ,𝜇0 ,kp−1,0)

KAu,q−1‎‎‎‎‎‎‎‎‎

(𝒠𝜆𝑞−1 ,𝜇𝑞−1 ,ku,q−1)

KA0,q−1‎‎‎‎‎‎‎‎

(𝒠𝜆𝑞−1 ,𝜇𝑞−1 ,k0,q−1)

KAp-1,q−1

(𝒠𝜆𝑞−1 ,𝜇𝑞−1 ,kp-1,q−1)

W(1)

0,v

… W(0)

u,v

… W

(1)p−1,v

W(0)

0,v

W(1)

u,v

W(0)

p−1,v

W(q)

0,v

W(q)

p−1,v

W(q)

u,v

l’u,v

(πu,v,q−1, Lu,v,q−1) πu,v,0

Page 122: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

110

Theorem 5.7.1: The OE-DOPE protocol is efficient and correct. Furthermore, consider the

adversary structure AS. Suppose that a user Ui 𝑈𝒜 sends x to DB. If the adversary 𝒜 does not

compromise all the key agents KAu,0,‎…,‎KAu,q−1, simultaneously, then the probability for 𝒜 to

retrieve‎the‎“digit”‎xu of x is negligible, where x = 0u<p xu · (2λ’)u.

Proof. The efficiency can be proved by a routine check. In the protocol, the user need to

create Wu,v including λ’’ elements in {0,1}λ’’,‎0‎≤‎u < p and‎0‎≤‎v < t. Since p is a constnat, λ’’ = λ’/t

= λ/(p∙t) and t = λ’c, 0 < c < 1, it is efficient for the user to create Wu,v. Also, it is efficient for the

user to create πu,v,j and Lu,v,j,‎0‎≤‎u < p,‎0‎≤‎v < t,‎and‎0‎≤‎j < q. Then the KAu,j need to encrypt the

elements in W(j)u,v by using 𝒠𝜆𝑗 ,𝜇 𝑗 . Note that λ0 = λ’’ = λ/(p∙t), μj = λj+1 for‎0‎≤‎j < q−1,‎and‎μj = 3λj

for‎ 0‎ ≤‎ j < q. Since p and q are constants, λj = μj = O(λ). Therefore the encryption 𝒠𝜆𝑗 ,𝜇 𝑗 is

efficient. It is also efficient to perform the permutation πu,v,j and Lu,v,j on the encryptions. Finally,

it is efficient for the DB to identify the location of the encryptions of xu,v and integrate them to

get COPE(x).

According to the encryption structure, COPE(x) is the ciphertext encrypted by the basic

DOPE scheme 𝒮𝒠pλ,μ. 𝒮𝒠p

λ,μ is based on the underlying basic DOPE scheme 𝒮𝒠tλ’,μ’. And 𝒮𝒠t

λ’,μ’ is

based on the underlying OPE scheme 𝒮𝒠 λ’’,μ’’, where 𝒠𝜆′′ ,𝜇′′ = 𝒠𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝒠𝜆0 ,𝜇0 . Hence

COPE(x) preserves order and the OE-DOPE protocol is correct.

For security, first note that if 𝒠𝜆𝑗 ,𝜇 𝑗 is a ROPF (random order-preserving‎function)‎for‎0‎≤‎j

< q, then the composition 𝒠𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝒠𝜆0 ,𝜇0 is also a ROPF. Therefore the basic DOPE

scheme 𝒮𝒠tλ’,μ’

based on the underlying OPE scheme 𝒮𝒠λ’’,μ’’ where 𝒠𝜆′′ ,𝜇′′ = 𝒠𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯○

𝒠𝜆0 ,𝜇0 has one-wayness security according to Theorem 5.6.1. Hence, the basic DOPE scheme

Page 123: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

111

𝒮𝒠pλ,μ

based on the underlying basic DOPE scheme 𝒮𝒠tλ’,μ’

also achieves one-wayness security

according to Theorem 5.6.1. Now suppose that the adversary 𝒜 compromises DB and retrieves

COPE(x). Then 𝒜 can derive 𝒠λ’’,μ’’

(xu,v) = 𝒠𝜆𝑞−1 ,𝜇𝑞−1 (⋯𝒠𝜆0 ,𝜇0 (𝑥𝑢 ,𝑣 , 𝑘𝑢 ,0) ⋯ , 𝑘𝑢 ,𝑞−1) from

COPE(x), 0‎≤‎u < p and‎0‎≤‎v < t. In order to retrieve xu, 𝒜 needs to retireve all xu,v for‎0‎≤‎v < t.

Since 𝒠𝜆𝑗 ,𝜇 𝑗 has one-wayness security and 𝒜 does not compromises all KAu,j for‎ 0‎ ≤‎ j < q, it

implies that the probability for 𝒜 to derive xu is negligible. Additionally, since the adversary 𝒜

does not compromise the key agents KAu,0,‎…,‎KAu,q−1 simultaneously, 𝒜 cannot retrieve all

πu,v,j for 0‎≤‎j < q, or all ku,j for 0‎≤‎j < q. Furthermore, the order information in W(q)

u,v cannot be

used to link them to the plaintexts in Wu,v. Thus, 𝒜 cannot identify xu,v in the vector Wu,v even if

it compromises KAu,0 and DB. But if the adversary compromises the key agent KAu,j,‎1‎≤‎j < q, it

can narrow down xu,v from λ’’ elements to λ’’/2 element based on Lu,v,j. Hence the probability for

the adversary to retrieve xu,v in Wu,v is at most 2q−1

/λ’’ (note that the adversary can compromise at

most q−1‎key‎agents‎in‎a‎chain).‎Consequently,‎the‎probability‎for‎the‎adversary‎to‎retrieve‎xu =

0v<t xu,v · (2λ’’

)v is 2

q−1/λ’’

t. Since p = q = O(1), t = λ’

c = (λ/p)

c for some constant 0 < c < 1. It

implies that 2q−1

/λ’’t is a negligible function of λ. Hence, the probability for 𝒜 to retrieve xu of x

is negligible.

Note that if the adversary 𝒜 compromises less than q key agents, then it implies that 𝒜

does not compromise the key agents KAu,0,‎ …,‎ KAu,q−1 simultaneously‎ for‎ any‎ 0‎ ≤‎ u < p.

According to Theorem 5.7.1, the probability for 𝒜 to‎ retrieve‎any‎“digit”‎xu of x is negligible.

We summarize the conclusion in the following corollary.

Page 124: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

112

Corollary 5.7.2: Consider the adversary structure AS. Suppose that a user Ui 𝑈𝒜 sends x

to DB. If the adversary 𝒜 compromises less that q key agents, then the probability for 𝒜 to

retrieve‎every‎“digit”‎of‎x is negligible.

5.8 Performance Study of OPE and Protocols for Multi-user Systems

We study the performance of the protocols basic-DOPE and OE-DOPE using different

underlying OPE schemes. To the best of our knowledge, five OPE schemes have been proposed

in the literatures [1, 6, 12, 39, 53]. None of them, except for the OPE algorithm proposed in ‎[12],

have cryptographic security proofs. The OPE algorithm proposed in ‎[6] can only be used in a

static system where no new data can be inserted to the database. The algorithm given in ‎[39] is

not a full solution because it cannot compare all the plaintexts. The OPE algorithm developed

in ‎[1] needs to process the whole database to model the data distribution. Thus, we only consider

the OPE algorithms proposed in ‎[12] and ‎[53] in our experimental study.

The performance of the OPE schemes has never been analyzed in the literature. Thus, we

first study the performance of the Hyper and Poly OPE schemes. We randomly generate a

polynomial with degree 10 for the Poly scheme. The domain of the plaintext is {0,1}λ and we

choose λ = {8, 16, 32, 64, 96, 128, 256, 512, 1024} and c = 0.5. The ciphertext range is {0,1}μ

and we consider μ = 3λ for Hyper OPE scheme and μ = 10λ for Poly OPE scheme. The

experiments are run on a 2.50GHz Intel Core 2 Duo Processor. Table 5.1 shows the execution

time in milliseconds for Hyper and Poly to encrypt a single critical data item of λ bits.

As can be seen, Hyper OPE scheme is far more expensive than Poly OPE scheme. In

Hyper OPE scheme, the process for realizing the hypergeometric random variable is very time

Page 125: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

113

consuming. In Poly OPE scheme, it evaluates the randomly selected polynomial with the

plaintext as input, which is much less time consuming than Hyer OPE. But Hyper OPE scheme

can be proved to achieve one-wayness security, while there is no security proof for Poly OPE

scheme.

Table 5.1. Performance of Hyper and Poly OPE schemes.

λ Hyper OPE Poly OPE

8 20.37022 0.0003

16 4965.81013 0.0007

24 520073.44982 0.0008

32 0.0010

64 0.0027

128 0.0077

256 0.0261

512 0.0929

1024 0.3710

Now we compare the performance of the basic-DOPE and the OE-DOPE protocols

integrated with Hyper and Poly OPE schemes. In the two request communication protocols, the

request is sent from the user to the KA and then to the DB. To factor in the communication

latencies between the system entities, we allocate the user, the key agents and the DB to different

PlanetLab ‎[57] computers and measure the communication latencies between them. The user is in

Dallas and the DB is in Los Angeles. The basic-DOPE protocol requires p key agents and we

choose p = 4 (make λ divisible by p). The four key agents are allocated to Phoenix (Arizona), Salt

Lake City (Utah), Carson City (Nevada), and Eugene (Oregon). The OE-DOPE protocol requires

p*q key agents. We use the same p value and consider q = 2. The 4 additional key agents (out of

8) are allocated in the same city as the other key agents in the same chain. The request message

Page 126: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

114

without the critical data is of size 170 bytes (based on the average of the sizes of some common

queries). The critical data size is λ bits.

For‎ comparison‎ purpose,‎ we‎ also‎ consider‎ the‎ “No‎ Encryption”‎ (NE)‎ request‎

communication protocol, the Hyper request communication protocol, and the Poly request

communication protocol. In NE, the user directly sends the query (with the critical data in

plaintext) to DB. In Hyper/Poly, the user knows the master OPE key and encrypts the

confidential data using the Hyper/Poly OPE scheme with the master key and send the ciphertext

directly to DB. Table 5.2 shows the performance comparisons (in milliseconds) of the NE,

Hyper, basic-DOPE and OE-DOPE protocols using the Hyper OPE scheme. Table 5.3 shows the

performance comparisons (in milliseconds) of the NE, Poly, and the basic-DOPE and OE-DOPE

protocols using the Poly OPE scheme.

Table 5.2. Comparisons of the basic-DOPE protocol and OE-DOPE protocol with Hyper OPE

scheme.

λ NE Hyper basic-DOPE

+ Hyper

OE-DOPE +

Hyper

8 85.87 106.24 506.06 7718.90

16 85.92 5051.73 1525.28 317766.88

32 86.03 20537.11 9.19E+07

64 86.23 4965977.56

96 86.44 5.2E+08

As shown in Tables 5.2 and 5.3, the OE-DOPE protocol is more expensive than the basic-

DOPE protocol. This is because: (1) There are extra random data to be transmitted and encrypted

in the OE-DOPE protocol to facilitate oblivious encryption. (2) After encryption by one key

agent, the ciphertext grows. The longer the chain, the larger the size of the ciphertext becomes

(For example, Poly OPE scheme takes 0.0028 milliseconds to encrypt 64 bits plaintext. But after

Page 127: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

115

encryption, the size of the plaintext becomes 640 bits. It then takes 0.16 milliseconds to encrypt

the cipher of 640 bits and will generate a new cipher of 6400 bits.) But the OE-DOPE protocol

achieves a higher security level than the basic-DOPE protocol. Compare with the baseline NE

protocol, the basic-DOPE protocol using Hyper OPE scheme is over 200 times slower for λ = 32,

and the basic-DOPE protocol using Poly OPE scheme is at most 2 times slower for any λ,‎8‎≤ λ ≤‎

1024. The Poly protocol has a similar performance to that of the NE protocol since the

encryption‎time‎of‎Poly‎OPE‎is‎very‎small‎for‎8‎≤ λ ≤‎1024.‎

Table 5.3. Comparisons of the basic-DOPE protocol and OE-DOPE protocol with Poly OPE

scheme.

λ NE Poly basic-DOPE +

Poly

OE-DOPE +

Poly

8 85.87 85.87 166.62 194.35

16 85.92 85.92 167.03 200.88

32 86.03 86.03 167.83 214.18

64 86.23 86.23 169.32 239.34

96 86.44 86.44 170.73 262.81

128 86.64 86.64 172.05 285.09

256 87.43 87.45 176.79 366.91

512 88.92 89.01 184.65 513.02

1024 91.62 91.99 197.23 786.72

Note that in OE-DOPE, the key agents are logically deployed in p rows and q columns. We

study the influence of q on the performance of the OE-DOPE protocol, We set p = 4, and vary q

from 2 to 4. The p∙q key agents are physically allocated to Phoenix (Arizona), Salt Lake City

(Utah), Carson City (Nevada), and Eugene (Oregon). Key agents in the same row are allocated in

the same city. The user is still in Dallas and the DB is still in Los Angeles. The request message

without the critical data is of size 170 bytes and the critical data is of size λ bits (we vary λ in the

experiments). The performance results (in milliseconds) of the OE-DOPE protocol using Hyper

Page 128: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

116

OPE scheme (denoted by OHyper) and Poly OPE scheme (denoted by OPoly) are given in Table

5.4.

Table 5.4. Performances of the OE-DOPE protocol using Hyper OPE scheme/Poly OPE scheme

for different q.

λ OHyper

q=2

OHyper

q=3

OPoly

q=2

OPoly

q=3

OPoly

q=4

8 7718.9 48776.23 194.35 238.30 402.90

16 317766.88 200.88 266.33 516.80

32 9.19E+07 214.18 316.26 720.34

64 239.34 404.95 1183.25

96 262.81 488.20 1960.84

128 285.09 580.90 3091.75

256 366.91 1053.45 11920.64

512 513.02 1849.54 57943.90

1024 786.72 5411.71 313460.75

As can be seen, the execution time for the OE-DOPE protocol with the Hyper OPE or Poly

OPE scheme increases with increasing q. The impact of q is more significant for larger λ. With q

= 3 and for a 32-bit critical data, the OE-DOPE protocol takes 0.7 seconds, which is an

acceptable performance.

5.9 Summary

In this chapter, we first studied the security of the OPE schemes under the known plaintext

attack model, where the adversary knows a set of h random plaintext ciphertext pairs, and then is

given a ciphertext (called challenge) to compromise. An encryption scheme is said to achieve the

one-wayness security if the probability for any PPT adversary to fully recover the challenge is

negligible. We show that for the ideal OPE object achieves the one-wayness security, i.e.

although the adversary may retrieve some information about the challenge, the probability for the

Page 129: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

117

adversary to fully recover the challenge is a negligible function of λ = logm if the number h of

known plaintexts/ciphertext pairs satisfies h = o(mε), 0 < ε < 1, and n ≥‎m

3. In the security proof

(in the appendices), we analyze the expected number of bits zh of the plaintext remaining secret

from the adversary against known plaintext attacks. zh can be formulated by the average min-

entropy [22, 25]. First, we derived an upper bound on zh for any OPE scheme against a known

plaintext attack. Then, we derive a lower bound on zh for the ideal OPE object. These two

inequalities bound the security that the ideal OPE object can achieve, and indicates the one-

wayness security of the ideal OPE object.

Then, we develop two novel protocols to extend existing OPE schemes for multi-user data-

centric systems. Users can encrypt their secret data using our OPE protocols without knowing

the OPE encryption key. Also, we develop a simple and effective response protocol to allow

efficient delivery of secret data in the response to the user. Our protocols are general and can be

used with any OPE scheme. We have proved their correctness and security. We have also studied

their performance and the results show that the protocols have a fairly reasonable overhead when

the underlying OPE scheme is relatively efficient.

Page 130: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

118

CHAPTER 6

PREFIX-PRESERVING ENCRYPTION

Prefix-preserving encryption (PPE) scheme is a deterministic symmetric-key encryption scheme.

The ciphertexts of a PPE preserve the prefix of the plaintexts, i.e., the longest common prefix of

any two ciphertexts is of the same length as the longest common prefix of the corresponding

plaintexts. Such prefix preserving property enables PPE to support prefix based computations,

such as computation on anonymized IP addresses ‎[78], prefix-matching search ‎[4], and range

search ‎[48].

The security of PPE is weakened since some prefix information of plaintexts is leaked

from ciphertexts. But existing works do not offer sufficient security analysis of the PPE schemes:

either they prove the security against the author-defined attacks, or they illustrate the security

based on experiments. Morever, the security proofs in [4, 78] are incomplete because they prove

that the real PPE schemes are computationally indistinguishable from the ideal PPE object (a

special PPE) whose security is unknown. If the security of the ideal object is unacceptable, then

the proof of indistinguishability between the real scheme and the ideal object is not very

indicative in security assurance.

In this chapter we first develop a novel mechanism to analyze the security of PPE. We

follow the same approach as that in ‎[12] to seek a necessarily and sufficiently weakened security

notion to qualify the security of the ideal PPE object defined in Section ‎6.1. First, we prove that

no PPE scheme is secure under IND-CPA by designing a DLLCP attack (let the adversary query

Page 131: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

119

two plaintext pairs with different lengths of longest common prefix strings) in Section ‎6.2. Then

we weaken the security notion from IND-CPA to IND-PCPA (indistinguishability under the

prefixed chosen-plaintext attack) and prove that (1) such weakened security notion is necessary

(otherwise the DLLCP attack will be successful), and (2) the ideal PPE object is secure under

IND-PCPA in Section ‎6.2. From (1) and (2), we conclude that the security notion IND-PCPA

exactly qualifies the security of the idea PPE object.

In Section ‎6.3, we develop a novel distributed PPE algorithm based on the PPE algorithm

𝒠 constructed in ‎[78], and extend PPE to multi-user systems based on the distributed PPE

algorithm by using multiple key agents. Consider a PPE system consisting of a server DB

hosting data encrypted by PPE using a master encryption key k. Assume that a user sends a query

to DB which contains a confidential data x. In our PPE protocol, k is secret shared and

distributed to the group of key agents. The user secret shares its confidential data x and passes

the‎ shares‎ to‎ the‎ key‎ agents.‎ The‎ key‎ agents‎ then‎ “distributedly”‎ encrypt‎ the‎ data‎ shares‎ into‎

cipher shares, which in turn, are assembled into the ciphertext by the DB. We formally prove the

security of our protocol by defining an ideal model for PPE protocols and showing that our PPE

protocol is computationally indistinguishable from the ideal model. We also conduct experiments

to study the performance of the protocol, showing that it has a reasonable overhead.

Experimental studies and performance results for our multi-user PPE protocol is presented in

Section ‎6.4. Finally, we summarize this chapter in Section ‎6.5.

Page 132: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

120

6.1 Ideal PPE Object

The ideal PPE object is a special PPE such that the encryption function is uniformly

randomly selected from all the prefix-preserving functions defined as follows.

Definition 6.1.1 (Ideal PPE Object): We say that 𝒮𝒠* = (𝒦*

, 𝒠*, 𝒟*

) is the ideal PPE object

if

- 𝒦 * uniformly randomly selects f 𝐹

{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 ≜ {g: {0,1}

l{0,1}

l | |LCP(x1,x2)| =

|LCP(g(x1),g(x2))|, x1,x2{0,1}l};

- 𝒠* encrypts x to f(x);

- 𝒟* decrypts y to f

−1(y).

In Lemma 6.1.1, we (1) prove that the prefix-preserving functions are invertible and,

hence, the ciphertexts of the ideal PPE can be decrypted, and (2) compute the cardinality of

𝐹{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 which will be used to prove the equivalence of the prefix-preserving function and the

tree-based function in Proposition 8.2.1.

Lemma 6.1.1: f is a bijection for any f 𝐹{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 and |𝐹

{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 | = 22𝑙−1.

Proof. For f 𝐹{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 , since the domain and range of f have the same (finite)

cardinality, it suffices to prove that f is injective. Assume that f(x1) = f(x2). Then |LCP(x1,x2)| =

|LCP(f(x1),f(x2))| = l. Hence x1 = x2.

Let N(l) denote the number of prefix-preserving functions with domain and range {0,1}l.

For l = 1 there are two prefix-preserving functions, which are f(0) = 0 and f(1) = 1; f(0) = 1 and

f(1) = 0. Thus, N(1) = 2.

Page 133: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

121

Let f1 and f2 denote any two prefix-preserving functions with domain and range {0,1}l−1

.

Then it can be used to construct the prefix-preserving function f and g with domain and range

{0,1}l. For x{0,1}

l, let x = x1…‎xl where xi{0,1},‎1‎≤‎i ≤‎l. We define

𝑓 𝑥1 … 𝑥𝑙 ≜ 0𝑓1 𝑥2 … 𝑥𝑙 if 𝑥1 = 0

1𝑓2 𝑥2 … 𝑥𝑙 if 𝑥1 = 1 and 𝑔 𝑥1 … 𝑥𝑙 ≜

1𝑓1 𝑥2 … 𝑥𝑙 if 𝑥1 = 0

0𝑓2 𝑥2 … 𝑥𝑙 if 𝑥1 = 1

It can be verified that f and g are different prefix-preserving functions and any prefix-preserving

functions with domain and range {0,1}l−1

must agree with the form of f or g. Hence, N(l) =

2N(l−1)2. We can derive N(l) = 22𝑙−1 by solving the close form of N(l) from the established

equations N(l) = 2N(l-1)2 and N(1) = 2.

Now we present the formal definition of the ideal PPE object in Definition 6.1.2.

Definition 6.1.2 (Ideal PPE Object): We say that 𝒮𝒠* = (𝒦*

, 𝒠*, 𝒟*

) is the ideal PPE object

if

- 𝒦* uniformly randomly selects f 𝐹

{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 ;

- 𝒠* encrypts x to f(x);

- 𝒟* decrypts y to f

−1(y).

Remark 6.1.1: The ideal PPE object is computationally infeasible since it involves

choosing f uniformly randomly from the set 𝐹{0,1}𝜆 ,{0,1}𝜆𝑃𝑃𝐸 , which is on the order of 22𝜆−1. In ‎[4],

the authors construct a real PPE scheme 𝒮𝒠 = (𝒦, 𝒠, 𝒟), and prove that the real PPE scheme is

computationally indistinguishable from the ideal PPE object.

Page 134: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

122

6.2 Security of PPE

Existing cryptographic security proofs for PPE schemes only reduce the security of real

PPE schemes to the security of the ideal PPE object by showing that they are computationally

indistinguishable. However it is not a complete security proof since the security of the ideal PPE

object is unknown and there has been no security analysis in the literature to show its security

level. In this section, we complete the existing security proof by developing a security notion

IND-PCPA and showing that it exactly qualifies the ideal PPE object.

IND-CPA is a well established security notion in cryptography. However, PPE schemes

require the ciphertexts to preserve the prefix of the plaintexts and cannot be qualified by IND-

CPA. Consider the following DLLCP (differentiated length of longest common prefix) attack

against the PPE scheme 𝒮𝒠 = (𝒦, 𝒠, 𝒟) with respect to IND-CPA in Figure 6.1.

Figure 6.1. The DLLCP attack.

In the DLLCP attack (shown in Figure 6.1), the adversary 𝒜 queries (𝑥10, 𝑥1

1) and (𝑥20, 𝑥2

1),

where LCP(𝑥10, 𝑥2

0) LCP(𝑥11, 𝑥2

1). If b = 0, 𝑥10 and 𝑥2

0 will be encrypted; if b = 1, 𝑥11 and 𝑥2

1 will

be encrypted. Since PPE preserves prefix, the adversary can distinguish whether the plaintexts

are 𝑥10 and 𝑥2

0 or 𝑥11 and 𝑥2

1 by comparing LCP(y1,y2) with LCP(𝑥10, 𝑥2

0) and LCP(𝑥11, 𝑥2

1), where

y1 and y2 are the returned ciphertexts of the encryption oracle. If LCP(y1, y2) = LCP(𝑥10, 𝑥2

0), then

(1) In the experiment 𝐄𝐗𝐏𝒮𝒠,𝒜IND −CPA −𝑏 , 𝒜 chooses the set of plaintext pairs

{(𝑥10, 𝑥1

1), (𝑥20, 𝑥2

1) | LCP(𝑥10, 𝑥2

0) LCP(𝑥11, 𝑥2

1)}

and sends it to ℒℛ;

(2) ℒℛ computes the set of ciphertexts {𝒠(𝑥𝑖𝑏 , 𝑘)}1≤i≤2, and sends it back to 𝒜;

(3) Finally 𝒜 outputs 𝑏′ = 0 if 𝐿𝐶𝑃 𝑥1

0 , 𝑥20 = 𝐿𝐶𝑃(𝒠 𝑥1

𝑏 , 𝑘 , 𝒠 𝑥2𝑏 , 𝑘 )

1 otherwise .

Page 135: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

123

the plaintexts are 𝑥10 and 𝑥2

0 and, hence, b = 0. If LCP(y1, y2) = LCP(𝑥11, 𝑥2

1), then the plaintexts

are 𝑥11 and 𝑥2

1 and, hence b = 1. Thus, the advantage of the adversary 𝒜 is 1. We summarize the

conclusion in the following lemma.

Lemma 6.2.1: PPE is not secure under IND-CPA.

In ‎[12], it has been shown that an OPE scheme does not satisfy IND-CPA and a weakened

security notion IND-OCPA has been defined to qualify the security of OPE schemes (though no

OPE schemes satisfy IND-OCPA either and no security notion has been found to properly

qualify OPE yet). Inspired by this approach, we define a weaken security notion IND-PCPA to

qualify the security of PPE schemes. According to the DLLCP attack, the adversary should only

be allowed to query the plaintext pairs in the set

PPPh ≜ { (𝑥𝑖0, 𝑥𝑖

1) {0,1}l {0,1}

l,‎1‎≤‎i ≤‎h | |LCP(𝑥𝑢

0, 𝑥𝑣0)| = |LCP(𝑥𝑢

1 , 𝑥𝑣1)|,‎1‎≤‎u, v ≤‎h }

Accordingly, we define the security notion IND-PCPA (indistinguishability under prefixed

chosen-plaintext attack) in Definition 6.2.1.

Definition 6.2.1 (IND-PCPA): IND-PCPA has the same definition as that of IND-CPA

except that the adversary is only allowed to query the prefixed plaintext pairs in the set PPPh.

It is obvious that IND-PCPA is the necessarily weakened security notion (with respect to

indistinguishability and left-or-right encryption oracle) for PPE. We show that it is also the

sufficiently weakened security notion for PPE by proving in Theorem 6.2.2 that the ideal PPE

object is secure under IND-PCPA, where the proof of the theorem is relegated to the appendices.

Page 136: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

124

Therefore, the real PPE schemes computationally indistinguishable from the ideal PPE object are

also secure under IND-PCPA.

Theorem 6.2.2: The ideal PPE object 𝒮𝒠* is secure under IND-PCPA.

6.3 PPE for Multi-user Systems

In this section we develop a security-enhanced protocol to support PPE in multi-user

systems. The multi-user system we consider consists of a single server hosting a database and a

set of users. Let DB denote the server and 𝑈 = {𝑈𝑗 ≥ 1} denote the set of users. The system

operations consist of a request protocol Q, in which a user 𝑈𝑖 issues a request (query) 𝑞 to 𝐷𝐵,

where 𝑞 may contain some secret data x that needs to be transmitted with 𝑞 to DB, and a

response protocol 𝑃, in which the DB sends back the response r to the user, where r may include

a returned data object y in encrypted form. Note that a request or a response may include

multiple data objects, but the processing will be the same. For simplicity, we assume that there is

only one secret data item in 𝑞 or r.

The PPE protocol should guarantee functionality requirements including: (1) When q

reaches DB, x should have been encrypted by the PPE using the key k; (2) When r is returned to

the user, the user should be able to obtain the plaintext y of r. The protocol should also satisfy

some security requirements, such as no entity in the system should have the knowledge of the

key k and x should be protected against all system entities but the owner. To avoid informal

security descriptions and facilitate formal security proofs, we define the security requirements

via an ideal model, in which (1) encryption and decryption are performed by a trusted party TP

who holds the key k (key agents replaced by TP); (2) the communication channels between TP

Page 137: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

125

and users/DB are secure. The ideal model implies the highest security level that the real PPE

protocol (including q and P) can achieve and we will prove that the‎real‎protocol‎is‎“equivalent”‎

to the ideal model.

The system entities, users, DB, and key agents, may collude to acquire additional

information. We unify the possible collusions and construct a passive adversary 𝒜 who tries to

gain extra information by compromising some entities in the system. We assume that the key

agents and DB are better protected than the users, and the adversary can compromise less than t

key agents (𝑡 ≤𝑚

2+ 1) simultaneously. Thus, the adversary structure is defined as

𝑍 = 𝑈𝒜 ∪ 𝐾𝐴𝒜 , 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ 𝐷𝐵 𝑈𝒜 ∪ 𝑈, |𝐾𝐴𝑈𝒜| < 𝑡 ≤

𝑚

2 + 1},

where 𝑈𝒜 is the set of compromised users and 𝐾𝐴𝒜 is the set of compromised key agents (note

that 𝑈𝒜and 𝐾𝐴𝒜 could be empty).

In Subsection ‎6.3.1, we introduce the general system design. Then, we discuss the response

and request protocols in the following two subsections. The proof that shows our PPE protocol

achieves the functionality requirements is given in Subsection ‎6.3.2. In Subsection ‎6.3.3 we

formally define the security requirement and prove that our PPE protocol achieves the

requirement.

6.3.1 General System Design

For convenience, we assume that there are only numerical data in DB. Data of other types

can be represented by numerical data easily. For each critical data item x, the DB maintains the

ciphertexts 𝐶𝑃𝑃𝐸 (𝑥), 𝐶𝐶𝐸(𝑥) . 𝐶𝑃𝑃𝐸 𝑥 is encrypted using a PPE scheme with a master key k.

Page 138: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

126

𝐶𝐶𝐸(𝑥) is encrypted using a classical encryption scheme (e.g. AES). The purpose of storing

𝐶𝐶𝐸(𝑥) is to support efficient transmission of responses. For each data item x, a different data key

𝑑𝑘𝑥 is used to generate 𝐶𝐶𝐸(𝑥). A user with access privilege to data item x will be granted key

𝑑𝑘𝑥 . In real implementation, the data items with the same access privileges can be grouped

together into an access domain and only one key is needed for each access domain. For example,

if data items x and y can be accessed by exactly the same set of users, then x and y can be in the

same access domain, i.e., we can have 𝑑𝑘𝑥 = 𝑑𝑘𝑦 .

Request protocol Q should transfer q to DB while ensuring the correct and secure

encryption of 𝐶𝑃𝑃𝐸 𝑥 and 𝐶𝐶𝐸(𝑥) in the request transmission process. (Note that 𝑈𝑖 can encrypt

x using 𝑑𝑘𝑥 and obtain 𝐶𝐶𝐸(𝑥). But since 𝑈𝑖 does not have the PPE master key k, it is not

possible for 𝑈𝑖 to compute 𝐶𝑃𝑃𝐸 𝑥 . Thus, a set of key agents (𝐾𝐴) are introduced to perform

PPE encryption in the request protocol. Let 𝐾𝐴 = {𝐾𝐴𝑗 |1 ≤ 𝑗 ≤ 𝑚} denote the set of key

agents. The key k is shared among the key agents such that no single entity in the system knows

the master encryption key k. The user shares x and sends the shares to the key agents in KA. The

key agents distributedly encrypt the shares of x with the shares of k and send the encrypted

shares to 𝐷𝐵. 𝐷𝐵 reconstructs the shares and get the ciphertexts 𝐶𝑃𝑃𝐸 𝑥 . The ``distributed"

encryption process is similar to the decryption process in the threshold public-key crypto system

[20, 21, 29, 30, 56, 63]. For the response protocol P, we use a simple but innovative design to

achieve efficiency without going through the key agents.

Page 139: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

127

6.3.2 Response Protocol

In the response protocol P, it simply include 𝐶𝐶𝐸 𝑦1 , 𝐶𝐶𝐸 𝑦2 , ⋯ , 𝐶𝐶𝐸 𝑦𝑡 in r. The user

should have access rights to 𝑦1, 𝑦2, ⋯ , 𝑦𝑡 and, hence, should have the encryption keys to decrypt

the data items in r. Consider the security of the system against adversary 𝒜 (assume that 𝒜 has

not compromised the users in 𝑈𝑦1, ⋯ , 𝑈𝑦𝑡

, where 𝑈𝑦𝑗 is the set of users who can access 𝑦𝑗 . Since

the protocol only transfers 𝐶𝐶𝐸 𝑦1 , 𝐶𝐶𝐸 𝑦2 , ⋯ , 𝐶𝐶𝐸 𝑦𝑡 , 𝒜 cannot get the encryption keys and

cannot compromises 𝑦1, 𝑦2, ⋯ , 𝑦𝑡 . Note that the design of P is fully discussed here and will not

be discussed further.

6.3.3 Request Protocol

We design the request protocol Q which consists of the distributed PPE protocol 𝑃𝒠𝑑 and

the reduction algorithm RA. In 𝑃𝒠𝑑, the key agents “distributedly” evaluated PPE 𝒠𝑑 and the DB

assembles the result shares into the intermediate ciphertext z. In RA, z is reduced (in size) to the

single-bit ciphertext y based on a mapping function f. In the following subsections, we introduce

the primitives used in the protocol and discuss the details of 𝑃𝒠𝑑 and RA.

Primitives

Here we introduce the primitives used for constructing 𝑃𝒠𝑑, including the secret sharing

algorithm Π and reconstructing algorithm Re over Zp where p is a prime number, another secret

sharing algorithm Π′ and reconstructing algorithm Re’ over a multiplicative group G satisfying

that the decisional Diffie-Hellman (DDH) problem is hard over G, the hash function H mapping

strings to Zp, and the hash function H’ mapping strings to G.

Page 140: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

128

Let G be a cyclic group where |𝐺| = 𝑝 and p is a prime number, and 𝑔 ∈ 𝐺 be a generator.

Without loss of generality, let G be a multiplicative group with the identity 1. We assume that

the decisional Diffie-Hellman (DDH) problem is hard over G, i.e. 𝑔, 𝑔𝑢 , 𝑔𝑣 , 𝑔𝑢𝑣 and

𝑔, 𝑔𝑢 , 𝑔𝑣 , 𝑔𝑤 are computationally indistinguishable for randomly selected 𝑢, 𝑣, 𝑤 from Zp. Let

𝐻 ∶ {𝑁𝑢𝑙𝑙} ∪ {0,1}∗ → 𝑍𝑝 be a cryptographic hash function, where Null denotes the empty

string. We assume that H is a random oracle, and then 𝑅 𝑥, 𝑘 ≜ 𝑔𝐻 𝑥 𝑘 is a pseudorandom

function according to ‎[52]. Let 𝐻′ : {0,1}∗ → 𝐺 be a cryptographic hash function. Since a

cryptographic hash function should be collision-free, we assume that 𝐻′ 0 ≠ 𝐻′ 1 .

Let Π and Re be the sharing and reconstructing algorithms of the (𝑡, 𝑚) threshold secret

sharing scheme over Zp ‎[63]. For secret 𝑥 ∈ 𝑍𝑝 ,

Π 𝑥 = 𝑠1, ⋯ , 𝑠𝑚 ∈ 𝑍𝑝𝑚 ,

where 𝑠𝑖 = 𝑓 𝑖 and f is a randomly selected polynomial over 𝑍𝑝 with degree 𝑡 − 1 satisfying

𝑓(0) = 𝑥. The reconstructing algorithm

𝑅𝑒 𝑠𝑖1, ⋯ , 𝑠𝑖𝑛 = 𝑥

for any n (𝑛 ≥ 𝑡 ) shares 𝑠𝑖1, ⋯ , 𝑠𝑖𝑛 ∈ {𝑠1, ⋯ , 𝑠𝑚 } by using the Lagrange's interpolation

formula. The (𝑡, 𝑚) threshold secret sharing scheme can be extended to the group G ‎[21] ‎[52].

Let Π′ and Re’ be the sharing and reconstructing algorithms of the (𝑡, 𝑚) threshold secret sharing

algorithm over G. For any secret 𝑥′ ∈ 𝐺

Π′ 𝑥′ = 𝑠1′ , ⋯ , 𝑠𝑚

′ ∈ 𝐺𝑚 ,

where 𝑠𝑖′ = 𝑥′ ∙ 𝑔𝑓 ′ (𝑖) and 𝑓′ is a randomly selected polynomial over 𝑍𝑝 with degree 𝑡 − 1

satisfying 𝑓′(0) = 0. The reconstructing algorithm

Page 141: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

129

𝑅𝑒′ 𝑠𝑖1

′ , ⋯ , 𝑠𝑖𝑛′ = 𝑥′

for any n (𝑛 ≥ 𝑡) shares 𝑠𝑖1

′ , ⋯ , 𝑠𝑖𝑛′ ∈ {𝑠1

′ , ⋯ , 𝑠𝑚′ } by using the Lagrange's interpolation formula

to the exponents.

Protocol 𝑷𝓔𝒅

Suppose that the master encryption key 𝑘 ∈ 𝑍𝑝 is shared by π, i.e., Π 𝑘 = 𝑘1, ⋯ , 𝑘𝑚 ,

and the share 𝑘𝑖 is distributed to 𝐾𝐴𝑖 for 1 ≤ 𝑖 ≤ 𝑚 . We describe the protocol 𝑃𝒠𝑑 to

“distributedly" evaluate the PPE algorithm 𝒠 d in Figure 6.2. It encrypts the plaintext 𝑥 =

𝑥1 ⋯𝑥𝑙 to the intermediate ciphertext 𝑧 = 𝑧1 ⋯ 𝑧𝑙 .

Figure 6.2. The Protocol 𝑃𝒠𝑑

.

As shown in Figure 6.2, for plaintext 𝑥 = 𝑥1 ⋯ 𝑥𝑙 , the user shares 𝐻′ 𝑥𝑖 and

𝐻 𝑥1, ⋯ , 𝑥𝑖−1 to the key agents, the key agents distributedly encrypt it, and DB assembles the

Goal: distributedly encrypt x = x1 …‎xl to z = z1 …‎zl

for i = 1 to l do

the user shares H'(xi) and H(x1,‎…,‎xi-1)‎by‎Π’ and‎Π,‎respectively;

let‎Π’ (H'(xi)) = (h'i1,‎…,‎h'im)‎and‎Π(H(x1,‎…,‎xi−1)) = (hi1,‎…,‎him);

for j = 1 to m do

the user sends (h'ij, hij) to KAj;

end for

end for

for i = 1 to l do

for j = 1 to m do

KAj computes h''ij = h'ij ∙‎𝑔𝑕𝑖𝑗 ∙𝑘𝑗 and sends it to the DB;

end for

end for

for i = 1 to l do

the DB selects n (n ≥‎2t−1) shares 𝑕′′𝑖𝑗1,‎…,‎𝑕′′𝑖𝑗𝑛

and computes zi =

Re’(𝑕′′𝑖𝑗1,‎…,‎𝑕′′𝑖𝑗𝑛

);

end for

the DB retrieves the intermediate ciphertext z = z1 …‎zl;

Page 142: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

130

encrypted shares into an intermediate ciphertext𝑧 = 𝑧1 ⋯ 𝑧𝑙 . We prove the correctness of this

protocol in Lemma 6.3.1.

Lemma 6.3.1: The DB retrieves the ciphertext encrypted by 𝒠d in the end of the distributed

protocol 𝑃𝒠𝑑.

Proof. According to 𝑃𝒠𝑑, 𝐻′ 𝑥𝑖 and 𝐻 𝑥1, ⋯ , 𝑥𝑖−1 are shared by the user. Let 𝑕𝑖𝑗

′ =

𝐻′ 𝑥𝑖 ⋅ 𝑔𝑓𝑖′ 𝑗 be the shares of 𝐻′ 𝑥𝑖 , where 𝑓𝑖

′ is a randomly selected polynomial over ℤ𝑝

with degree 𝑡 − 1 satisfying 𝑓𝑖′(0) = 0 , 1 ≤ 𝑗 ≤ 𝑚 . Let 𝑕𝑖𝑗 = 𝑓𝑖 𝑗 be the shares of

𝐻 𝑥1, ⋯ , 𝑥𝑖−1 , where 𝑓𝑖 is a randomly selected polynomial over ℤ𝑝 with degree 𝑡 − 1 satisfying

𝑓𝑖 0 = 𝐻 𝑥1, ⋯ , 𝑥𝑖−1 , 1 ≤ 𝑗 ≤ 𝑚. The key agent 𝐾𝐴𝑗 will compute 𝑕𝑖𝑗′′ = 𝑕𝑖𝑗

′ ⋅ 𝑔𝑕𝑖𝑗 ⋅ 𝑘𝑗 =

𝐻′ 𝑥𝑖 ⋅ 𝑔𝑓𝑖′ 𝑗 ⋅ 𝑔𝑓𝑖 𝑗 ⋅ 𝑘𝑗 = 𝑔(𝑙𝑜𝑔𝑔𝐻 ′ 𝑥𝑖 )+𝑓𝑖

′ 𝑗 +𝑓𝑖 𝑗 ⋅ 𝑘𝑗 , 1 ≤ 𝑗 ≤ 𝑚 . Notice that log𝑔 𝐻′(𝑥𝑖) +

𝑓𝑖′ (𝑗) is the share of log𝑔 𝐻′ 𝑥𝑖 using a polynomial over 𝑍𝑝 with degree 𝑡 − 1, 1 ≤ 𝑗 ≤ 𝑚.

And 𝑓𝑖 𝑗 ⋅ 𝑘𝑗 is the share of 𝐻 𝑥1, … , 𝑥𝑖−1 ⋅ 𝑘 using a polynomial over 𝑍𝑝 with degree

2𝑡 − 2,1 ≤ 𝑗 ≤ 𝑚. Therefore (log𝑔 𝐻′ 𝑥𝑖 ) + 𝑓𝑖′ 𝑗 + 𝑓𝑖 𝑗 ⋅ 𝑘𝑗 is the share of log𝑔 𝐻′(𝑥𝑖) +

𝐻 𝑥1, … , 𝑥𝑖−1 ⋅ 𝑘 using a polynomial over 𝑍𝑝 with degree 2𝑡 − 2, 1 ≤ 𝑗 ≤ 𝑚. Hence the DB

reconstructs

𝑧𝑖 = 𝐻′ 𝑥𝑖 ⋅ 𝑅 𝑥1, … , 𝑥𝑖−1, 𝑘 ,

where 𝑅 𝑥1, ⋯ , 𝑥𝑖−1 , 𝑘 = 𝑔𝐻 𝑥1 ,⋯ ,𝑥𝑖−1 ∙𝑘 by using n (𝑛 ≥ 2𝑡 − 1) shares based on the

Lagrange's interpolation to the exponents, 1 ≤ 𝑖 ≤ 𝑙.

We show that 𝒠d preserves prefix and it is secure under IND-PCPA in Lemma 6.3.2.

Page 143: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

131

Lemma 6.3.2: The encryption 𝒠d preserves prefix (i.e., two plaintexts share i common

prefix if and only if the ciphertexts share i common pre-blocks). Furthermore, it is secure under

IND-PCPA.

Proof. Since H’ and R are deterministic, if two plaintexts x and x’ share i common prefix,

then the ciphertext of x and the ciphertext of x’ share i common pre-blocks. Furthermore, since

H’(0) ≠ H’(1), the (i+1)-th block of the two ciphertexts are distinct. Therefore 𝒠d preserves

prefix. It implies from the proofs in [48, 78] that the PPE algorithm 𝒠 is computationally

indistinguishable from the ideal PPE object. Thus it is secure under IND-PCPA according to

Theorem 8.2.4. Comparing the formula zi of 𝒠d with the formula yi of 𝒠, the only difference is

that zi is a pseudorandom block and yi is a pseudorandom bit. Hence 𝒠d is also secure under IND-

PCPA.

Reduction Algorithm

Since 𝒠𝑑 = 𝑧 preserves prefix, the ciphertext 𝑧 = 𝑧1 ⋯ 𝑧𝑙 can support prefix search

already. But 𝒠d increases the size of the ciphertext since 𝑧𝑖 is a block instead of a bit. This can

impact the search performance significantly. We develop a reduction algorithm RA to reduce the

intermediate ciphertext𝑧 = 𝑧1 ⋯ 𝑧𝑙 to the final single-bit ciphertext 𝑦 = 𝑦1 ⋯ 𝑦𝑙 , 1 ≤ 𝑖 ≤ 𝑙.

We use a mapping function f to record the mapping between 𝑧𝑖 and 𝑦𝑖 . For a node v, let l(v)

denote the left child node, r(v) denote the right child node, vl(v)denotes the edge connecting v

and l(v), and vr(v) denotes the edge connecting v and r(v). RA is designed in Figure 6.3.

Page 144: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

132

In Figure 6.3, if f has recorded that 𝑧𝑖 has already been mapped to a bit, then the mapping

should be retained. Otherwise, 𝑧𝑖 is mapped to a chosen bit and f records this new mapping.

Lemma 6.3.3 proves the security and prefix preserving properties of the algorithm.

Figure 6.3. The Reduction Algorithm RA.

Lemma 6.3.3: RA is efficient. Furthermore, the encryption algorithm RA ∘ 𝒠d preserves

prefix and is secure under IND-PCPA.

Goal: reduce the intermediate ciphertext z = z1 …‎zl to the final ciphertext y = y1 …‎yl;

Initialization: the mapping function f = null;

v = root;

While v leaf node do

if f(vl(v)) = f(vr(v)) = null then

b $ {0,1}

if b = 0 then f(vl(v)) = zi; yi = 0; v = l(v);

else f(vr(v)) = zi; yi = 1; v = r(v);

end if

end if

if f(vl(v)) null & f(vr(v)) = null then

if zi = f(vl(v)) then yi = 0; v = l(v);

else f(vr(v)) = zi; yi = 1; v = r(v);

end if

end if

if f(vr(v)) null & f(vl(v)) = null then

if zi = f(vr(v)) then yi = 1; v = r(v);

else f(vl(v)) = zi; yi = 0; v = l(v);

end if

end if

if f(vl(v)) null & f(vr(v)) null then

if zi = f(vl(v)) then yi = 0; v = l(v);

else yi = 1; v = r(v);

end if

end if

end while

return y = y1 …‎yl;

Page 145: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

133

Proof. In RA, the mapping function f only records the mappings appeared on DB.

Moreover, when mapping z = z1 … zl to y = y1 … yl, RA takes l steps. Hence RA is efficient. For

two intermediate ciphertexts z = z1 … zl and z’ = z’1 … z’l, if they share i common pre-blocks,

then it can be verified that RA traverses the same i nodes for z and z’; if zi+1 ≠ z’i+1, then it can be

verified that RA traverses different (i+1)-th nodes for z and z’. Hence the first i blocks of z and z’

will map to the same i bits but the (i+1)-th blocks of z and z’ will map to different bits. Therefore

RA ∘ 𝒠d preserves prefix. The secure proof is analogous to that in Lemma 6.3.2.

Functionalities and Security Requirements Proofs for the Protocols

In this section, we prove that our PPE protocol satisfies the functionality and security

requirements. In Theorem 6.3.4 we prove that the request protocol Q and response protocol P

satisfy the functionality requirements (1) and (2), respectively.

Theorem 6.3.4: The request protocol Q realizes the functionality requirement (1) and the

response protocol P realizes the functionality requirement (2).

Proof. According to Lemmas 6.3.1, 6.3.2, and 6.3.3, the DB receives the ciphertext of x

encrypted by the PPE RA ∘ 𝒠𝑑 in Q. Therefore the request protocol Q realizes the functionality

requirement (1). In P the returned data object y will be encrypted. The recipient user will have

the encryption key and, hence, can decrypt the ciphertext and obtain y. Hence the response

protocol P realizes the functionality requirement (2).

We adopt the security definition for multi-party computation [18, 19] to define the security

requirement for our system, which is based on real model and ideal model defined in Definition

6.3.1.

Page 146: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

134

Definition 6.3.1 (Real Model and Ideal Model): The real model is exactly the request

protocol Q and response protocol P. In the ideal model, there are users, the DB, and a trusted

(incorruptible) party TP who holds the key. There are secure communication channels between

the TP and users/DB. In the ideal model the TP receives/sends the message from/to users/DB,

and does all the encryptions/decryptions needed in the protocols Q and P.

Now we define the security requirement in Definition 6.3.2. Essentially, it requires that the

real model is “equivalent” to the ideal model.

Definition 6.3.2 (Security requirement): Let 𝑉𝐼𝐸𝑊𝑅(𝑍) be the instance event randomly

selected from the event space of what the adversary 𝒜 can observe in the real model by

compromising entities in the set 𝑍 ∈ 𝒵. Let 𝑉𝐼𝐸𝑊𝐼 𝑍 be the instance event randomly selected

from the event space of what the adversary 𝒜 can observe in the ideal model by compromising

the entities in the set 𝑍 − 𝐾𝐴, the real model is secure if the adversary cannot retrieve more

information from the real model than the ideal model, or equivalently, if there exists a PPT

simulator 𝒮 such that 𝑉𝐼𝐸𝑊𝑅(𝑍) is computationally indistinguishable from 𝒮 𝑉𝐼𝐸𝑊𝐼 𝑍 , i.e.

the advantage of 𝒜, defined by

𝑨𝒅𝒗𝒜 ≜ Pr[𝒜(𝑉𝐼𝐸𝑊𝑅(𝑍)) = 1] − Pr[𝒜(𝒮(𝑉𝐼𝐸𝑊𝐼(𝑍))) = 1],

is bounded by a negligible function of the security parameter for any 𝑍 ∈ 𝒵.

We prove that our system achieves the security requirement in Theorem 6.3.5.

Theorem 6.3.5: Our system achieves the security requirement in Definition 6.3.2.

Proof. First we consider the security of Q. In both the real model and the ideal model, the

adversary 𝒜 can compromise some users and view the same thing. In the real model 𝒜 can

Page 147: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

135

compromise less than t key agents; while in the ideal model 𝒜 cannot compromise the trusted

party TP. Since the user shares 𝐻′ 𝑥𝑖 and 𝑅 𝑥1, ⋯ , 𝑥𝑖−1, 𝑘 to the key agents by using (𝑡, 𝑚)

secret sharing scheme and 𝒜 compromises less than t shares, the view of 𝒜 is random numbers

and, hence, can be simulated by 𝒮. In the real model 𝒜 can compromise the DB and view the

intermediate ciphertext 𝑧 = 𝑧1 ⋯ 𝑧𝑙 , the final ciphertext 𝑦 = 𝑦1 ⋯ 𝑦𝑙 , and the mapping

function f; while in the ideal model 𝒜 can only view the final ciphertexty. Since the difference

between the intermediate ciphertextz and final ciphertexty is that 𝑧𝑖 is a random block and 𝑦𝑖 is a

random bit. Therefore z can be simulated by 𝒮 based on y. The mapping function f can be

simulated accordingly based on z and y. Hence, 𝑉𝐼𝐸𝑊𝑅 can be simulated by 𝒮 based on 𝑉𝐼𝐸𝑊𝐼.

Then we consider the security of P. In P only the users who can access rights to the data

will have the key. Thus, the adversary 𝒜 cannot get the encryption keys unless 𝒜 compromises

the corresponding users. Therefore P achieves the security requirement because the adversary

cannot achieve more information in P than in the ideal model.

6.4 Performance Study

We conduct experiments to study the performance of the request protocol Q. Specifically,

we study the performance of 𝒠𝑑 since it is the dominant factor. First, we consider the secret

sharing factor in 𝒠𝑑 . In 𝒠𝑑 , two secret sharing schemes have been used, one is and Re over Zp,

and the other is ’ and Re’ over G. Various groups can be used for G and here we use the

Schnorr group, i.e., G is a multiplicative subgroup of Zq*, where |G| = p and p is a 256-bit prime

Page 148: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

136

number. We implemented the algorithms and ran them 104 trials on a PC with 2.50GHz Intel

Core 2 Duo Processor. The average execution times are shown in Figure 6.4.

As shown in Figure 6.4, ’ and Re’ has a higher computation cost than and Re because

the computation of ’ and Re’ needs extra group operations. Since the Lagrange's interpolation

is linear, the reconstruction algorithm has a lower computation cost than the sharing algorithm

which requires polynomial evaluation. Both the sharing time and the reconstruction time increase

when the threshold t increases (which is obvious from the sharing and reconstruction approach).

Figure 6.4. Computation Cost of Secret Sharing over Zp and G (Share Number m = 6).

To factor in the communication latencies between the system entities, we allocate the user,

the key agents and the DB to different PlanetLab computers and measure the communication

latencies between them. The user is in Dallas and the DB is in Los Angeles. Six key agents (i.e.,

m = 6) are allocated to Phoenix (Arizona), Salt Lake City (Utah), Carson City (Nevada), Eugene

(Oregon), Albuquerque (New Mexico), and Denver (Colorado). Both hash functions H and H’

are SHA-2. We assume that the request message without the critical data is of size 170 bytes

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

2 3 4 5 6

Tim

e (

mill

ise

con

ds)

t

Π

Re

0

0.5

1

1.5

2

2.5

2 3 4 5 6

Tim

e (

mill

ise

con

ds)

t

Π'

Re'

Page 149: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

137

(based on the average size of some common queries). The critical data size is l bits and we set l =

{8,16,32,64,128,256,512,1024}. Since the threshold t should satisfy the condition t < 𝑚

2 + 1 [18,

19], we set t = 2,3,4. For comparison purpose, we also consider a “No Encryption” protocol and

a “PPE” protocol. In “No Encryption”, the user directly sends the query (with the critical data) to

the DB without any encryption. In “PPE”, we assume that the user has the encryption key and

encrypts the critical data by the PPE constructed in ‎[79], and then sends the query (with the

encrypted critical data) to the DB. The experimental results are given in Figure 6.5 and

summarized in Table 6.1.

Figure 6.5. Encryption Cost Comparisons for Different Protocols.

As shown in Figure 6.5 and Table 6.1, the encryption cost of “No Encryption” < the

encryption cost of “PPE” < the encryption cost of 𝒠𝑑 for t = 2 < the encryption cost of 𝒠𝑑 for t =

3 < the encryption cost of 𝒠𝑑 for t = 4. The encryption costs of all protocols increase when l

increases because the length of the critical data increases. “No Encryption” requires

approximately 90 millisecond, and its encryption time increases slowly when the length of the

0

100

200

300

400

500

600

700

8 16 32 64 128 256 512 1024

Tim

e (

mill

ise

con

ds)

l

No Encryption

PPE

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

8 16 32 64 128 256 512 1024

Tim

e (

mill

ise

con

ds)

l

E_d (t=2)

E_d (t=3)

E_d (t=4)

Page 150: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

138

critical data increases because it does not incur encryption overhead but only incurs

communication overhead. “PPE” requires 92 milliseconds to 609 milliseconds when l increases

from 8 to 1024. Relatively, 𝒠𝑑 incurs a much higher encryption cost than pure “PPE”, from 3

folds when data size is 8 bits to 6 folds when data size becomes 1024. This is because it also

incurs the sharing and reconstructing cost as well as a higher communication cost due to the use

of intermediate key agents. However, a multi-user PPE protocol is essential and the cost is

bearable. When t increases, the computation cost of secret sharing increases and, hence, the

encryption cost of 𝒠𝑑 increases, but the increase is relatively slow. Thus, using additional key

agents to enhance security can be a feasible method.

Table 6.1. Encryption Cost (in milliseconds) Comparisons for Different Protocols.

l “No Encryption” “PPE” 𝓔𝒅 (t=2) 𝓔𝒅 (t=3) 𝓔𝒅 (t=4)

8 88.11 92.14 305.01 306.27 309.71

16 88.16 96.22 373.55 376.05 382.94

32 88.27 104.38 483.21 488.21 502.00

64 88.47 120.70 661.87 671.88 699.46

128 88.88 153.33 960.05 980.07 1035.24

256 89.67 218.56 1471.30 1511.34 1621.67

512 91.16 348.95 2371.91 2451.98 2672.65

1024 93.86 609.45 3999.07 4159.23 4600.57

6.5 Summary

In this chapter, we developed the first complete security proof for PPE by qualifying the

security of the ideal PPE object. We created a new security notion, IND-PCPA, and proved that

the ideal PPE object is secure under IND-PCPA and can at the best reach IND-PCPA security.

We also built a protocol and extended an existing PPE scheme to support multi-user systems, in

which users do not know the master encryption key (in fact, no single entity in the system knows

Page 151: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

139

the master encryption key) for encrypting and decrypting the data to be sent to and received from

the server, respectively. We solve the challenge and designed a distributed PPE encryption

scheme for the multi-user protocol. The correctness and security of the multi-user protocol have

been proved rigorously. The performance of the protocol is studied experimentally to illustrate

its feasibility.

Page 152: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

140

CHAPTER 7

SUMMARY AND FUTURE RESEARCH

With the increasing importance of data intensive applications, the design of secure storage

systems becomes a very critical issue. In many situations, the storage servers are required to

process queries issued by the users. In this case, the data should be encrypted by various special

encryption schemes to support secure computations on the encrypted data. In this PhD research,

we study secure computation techniques including HE, OPE, and PPE schemes. We investigate

the existing works on HE, OPE, and PPE schemes, and overcome some of the limitations in

existing works.

We have constructed a novel non-circuit based HE algorithm. Our scheme is fully

homomorphic and we have proved that its security is equivalent to the well known large integer

factorization problem (which is also the security basis for RSA) under a bounded chosen

plaintext attacks. Our scheme yields a very practical time complexity for encryption, decryption,

and computation. Compare to Gentry’s‎ algorithm, which is fully homomorphic and semantic

secure, our scheme is 107 times faster in addition and 6*10

5 times faster in multiplication. We

have also extended our HE algorithm to handle multiple users with different access rights,

allowing individual users to encrypt the secret data when sending them with the requests and

decrypt the ciphertexts received in the responses using individualized keys. The request and

response protocols are based on similarity transformation and can achieve the same security as

Page 153: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

141

our homomorphic encryption scheme. Experiments show that our protocols have a satisfactory

performance.

We also have significant contributions in encryption schemes that facilitate search on

ciphertexts, namely, the OPE schemes. We use information theory to analyze the security of the

ideal OPE object defined in ‎[12]. Specifically, we derive the expected number of bits (zh) of the

plaintext that can remain secret under h known plaintext attacks. The result shows that although

the adversary may retrieve some information about the plaintext, the probability for the

adversary to fully recover the plaintext is a negligible function of the security parameter. We also

show that the ideal OPE object, as defined in ‎[12], may not be the most secure OPE. Then we

present two generalized OPE (GOPE) algorithms that satisfy stronger notions of security than the

ideal OPE object. To allow multiple users to use the master key for encrypting the data sent to

the database, we have developed two digit based request-response protocols DOPE and OE-

DOPE. Both DOPE and OE-DOPE can be used with any existing OPE scheme to support multi-

user systems. The performance study results show that the protocols have a fairly reasonable

overhead. When the underlying OPE scheme is relatively efficient, the protocols can yield good

performance.

PPE schemes have been developed to facilitate prefix based computations on encrypted

data, However, the attempt to qualify the security of the ideal PPE object has not been successful

in the literature. We have developed the first complete security proof to qualify the security of

the ideal PPE object. Also, we have proven that the ideal PPE object is the most secure PPE

scheme. In contrast to our proof showing that the ideal OPE object may not be the most secure

Page 154: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

142

OPE, this is a positive confirmation of the security limit that any PPE scheme can achieve. We

have also converted an existing PPE scheme to a distributed algorithm, and constructed a

protocol based on the distributed PPE scheme to support multi-user systems. The correctness and

security of the multi-user PPE protocol have been proved rigorously. The performance of the

protocol is studied experimentally to illustrate its feasibility.

We plan to continue the research on the construction and, in particular, the application of

HE, OPE, and PPE schemes. First, we are interested in improving the security and performance

of these schemes. For example, the only OPE scheme that has a security proof is very inefficient,

taking over 10 seconds to encrypt/decrypt an 18-bit data object. Though it is possible to improve

its performance by partitioning the data and encrypt individual pieces distributedly, the security

of such approach needs to be carefully analyzed. Moreover, the problem of what is and how to

construct the most secure OPE is still open and we plan to investigate it further.

Second, we plan to apply HE schemes to practical applications. The HE scheme we have

invented has a substantial performance advantage over existing constructions. We plan to apply

it to several important application domains such as distributed key management in large scale

systems and privacy preserving data mining, in which the security requirements match with the

security level our HE scheme can assure. The key management tasks in very large scale system

may be highly demanding. For example, in future SCADA (supervisory control and data

acquisition) systems, there may be trillions of devices attached to the network and a centralized

key manager will not be feasible. Thus, it is necessary to have many distributed key managers.

This implies that these key managers may no longer be highly trusted, i.e., it is likely that one of

Page 155: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

143

them may be malicious or may have security loopholes and can be compromised more easily.

We plan to investigate the techniques such that the key generation and refreshing of keying

materials can be done securely using HE algorithms.

In privacy preserving data mining, HE and OPE schemes can be used to protect the data

and support arithmetic and comparison operations. However, the single key problem in existing

HE and OPE schemes prevents them from conducting data mining on federated data centers,

where there are multiple data owners who would like to preserve the secrecy of their data while

allowing some data analysis agents to extract statistical information from the collective data sets.

Our multi-user HE and DOPE protocols can provide potential solutions. Specifically a key agent

(not the data owner or the data analysis agent) can acquire the encrypted data (encrypted using

different keys) from multiple sources and transform them to ciphers with encryption key KNULL

(no single entity in the system would have the key for decrypting ciphers encrypted using

KNULL) following our multi-user HE and/or OPE request protocols. The transformed ciphers can

be passed to the data analysis agent to perform data mining. The analysis results can be sent back

to the key agent to be further transformed into ciphers encrypted with KDAA following our multi-

user HE and/or OPE response protocols. The final ciphers can be decrypted by the data analysis

agent with its individualized key KDAA.

Despite a feasible protocol, it can still be challenging to use HE and OPE schemes to

achieve privacy preserving data mining. In decision tree learning, it is necessary to allocate the

attributes to the nodes of a binary tree based on the information gain which can be evaluated by

the entropy. In order to compute the entropy of an attribute, the values of the attribute and the

Page 156: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

144

split point need to be compared, which requires OPE scheme to protect the security. Arithmetic

operations are only needed for computing the numbers of values that are greater than and smaller

than the split point, which can be achieved without HE scheme. Hence, the privacy preserving

decision tree mining can be achieved with an OPE scheme. Existing privacy preserving decision

tree mining protocols ‎[2] ‎[24] mainly use perturbation techniques which are susceptible to the

data recovery attacks ‎[43] ‎[45]. We plan to compare the security and performance of the privacy

preserving decision tree mining protocol based on multi-user DOPE with that based on

perturbation techniques.

Some clustering techniques, such as k-means, require recursively computing the distance

between the data objects (n dimensional tuples) and k centroids, clustering the data objects to the

nearest centroids, and updating the centroids. The data objects can be protected by an HE scheme

and the distances can be computed directly on the ciphertexts. But the problem is how to find the

minimum distance since they are encrypted by HE algorithm. The existing privacy preserving k-

means clustering protocols ‎[23] ‎[58] ‎[70] either‎use‎Yao’s‎method‎or‎XOR‎homomorphic‎secret‎

sharing to evaluate the comparison circuit to find the minimum distance. But these methods incur

high computation and communication overheads. Although the distance includes the relation

information between the data objects and the centroids, it cannot be linked to every single value

in the data objects. Based on this consideration, the distances can be decrypted and compared

accordingly. In order to control information leak, we plan to combine the perturbation technique

to protect the distance information while supporting the correct comparison on the perturbed

distances. We also plan to investigate similar methods for other data mining algorithms.

Page 157: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

APPENDIX

A.1 Security Proof for OPE Schemes

We study the security of OPE schemes by deriving the upper and lower bounds on zh. In

Subsection A.1.1, we derive an upper bound on zh for any OPE scheme SE = (K, E ,D)

based on a specific known plaintext attack. We then consider the lower bound on zh based

on the ideal OPE object SE∗(K∗, E∗,D∗). However, since the known plaintext/ciphertext

set KP = {(xi, E∗(xi, k)) | 1 ≤ i ≤ h} is not determined, it is difficult to derive the lower

bound on zh = H(Xm∗ |Ym∗,n, KP ) directly. Instead, we take the following approach to get

the lower bound on zh. First, we consider the case h = 0 and derive the lower bound on z0.

Let Eh denote the event that the adversary reverses the ciphertext based on KP . Then there

is a 1-to-1correspondence between Pr(Eh) and zh, i.e., Pr(Eh) = 2−zh . Therefore, the upper

bound on z0 can be transformed to the lower bound on Pr(E0). Also, note that KP cuts the

domain into h + 1 segments and the range into h + 1 segments so that the OPE algorithm

encrypts the plaintexts from each sub-domain to the corresponding sub-range. Hence, we

apply the lower bound on Pr(E0) to each sub-domain and sub-range pair in order to get

the lower bound on Pr(Eh) (Subsection A.1.2). Finally, we get the upper bound on zh by

reversing the one-to-one correspondence between Pr(Eh) and zh.

A.1.1 Upper Bound on zh for Any OPE Scheme

In the following lemma, we give an upper bound on zh for any OPE scheme. In this lemma

and for the remainder of this paper, the base of the logarithm operator log is 2 and the base

of the natural logarithm operator ln is e.

Lemma A.1.1: For any OPE scheme SE = (K, E ,D), zh ≤ log m−hh+1

.

145

Page 158: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

146

Proof. Suppose that the adversary knows the plaintext/ciphertext pairs (xi = i(m+1)h+1

,

E(xi, k)), 1 ≤ i ≤ h. Assume that x is uniformly randomly selected from m∗ = [m]−{xi|1 ≤

i ≤ h} and the ciphertext E(x, k) is given to the adversary. Note that there exists i′ such

that

E(

(i′ − 1)(m+ 1)

h+ 1, k

)< E(x, k) < E

(i′(m+ 1)

h+ 1, k

).

Let E denote the event (i′−1)(m+1)h+1

< x < i′(m+1)h+1

. Since the encryption algorithm E preserves

the order of plaintexts, the adversary can conclude that E has the min-entropy

H∞(E) = − log1

i′(m+1)h+1

− (i′−1)(m+1)h+1

− 1= log(

i′(m+ 1)

h+ 1− (i′ − 1)(m+ 1)

h+ 1− 1) = log

m− hh+ 1

.

Hence zh ≤ log m−hh+1

. �

A.1.2 Lower Bound on zh for the Ideal OPE Object

We take the following steps to derive the lower bound on zh for the ideal OPE object.

First we analyze the special case where the adversary has no knowledge of any plain-

text/ciphertext pairs, i.e., h = 0. To do so, we first derive a formula for z0, namely z0 =

− log n−1∑

j∈[n] maxi∈[m](j−1i−1)(

n−jm−i)

(n−1m−1)

. We then prove that there exists a constant 0 < c < 1

such that z0 ≥ c logm for all n > m2 > 1. Thus, in the case h = 0, the probability for the

adversary to recover x is at most 2−z0 = 2−c logm = m−c. Then we consider the case where

the adversary has knowledge of h plaintext/ciphertext pairs. We derive an upper bound on

Pr(Eh), i.e., the expected probability for the adversary to fully recover a plaintext given its ci-

phertext and h known plaintext/ciphertext pairs. Note that the known plaintext/ciphertext

pairs will split the domain and range into intervals. We first prove a lemma giving an upper

bound on the expected number of “short” intervals, and use the previous result to bound

the probability on the remaining “long” intervals. Finally, we use these results to derive a

lower bound on zh. Finally we numerically compute the values of c′ = z0logm

.

Page 159: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

147

The Case h = 0

We begin by proving a lower bound on z0 and the corresponding upper bound on Pr(Eh).

To do so, we first derive the formula of z0 in the following Lemma A.1.2.

Lemma A.1.2: For ideal OPE object, z0 = − log n−1∑

j∈[n] maxi∈[m](j−1i−1)(

n−jm−i)

(n−1m−1)

.

Proof. Let SIFm,n(i, j) = {f ∈ SIFm,n | f(i) = j}. Then

|SIFm,n(i, j)| =(j − 1

i− 1

)(n− jm− i

).

Let SIFm,n(j) = {f ∈ SIFm,n | ∃i ∈ [m] s.t. f(i) = j}. Then

|SIFm,n(j)| =∑i∈[m]

|SIFm,n(i, j)| =∑i∈[m]

(j − 1

i− 1

)(n− jm− i

)=

(n− 1

m− 1

).

Let Fi,j be a uniform random variable on SIFi,j. Then

Pr(Ym,n = j) =∑

f∈SIFm,n

Pr(Ym,n = j|Fm,n = f) Pr(Fm,n = f)

=

∑f∈SIFm,n Pr(Ym,n = j|Fm,n = f)(

nm

) =

∑f∈SIFm,n(j) Pr(Ym,n = j|Fm,n = f)(

nm

)=

∑f∈SIFm,n(j)

1m(

nm

) =|SIFm,n(j)|m(nm

) =

(n−1m−1

)m(nm

) =1

n.

Note that Pr(Ym,n = j|Fm,n = f) = m−1 since for f ∈ SIFm,n(j), there exists xf,j s.t.

f(xf,j) = j. Since Pr(Xm = xf,j) = m−1 and Ym,n = Fm,n(Xm), Pr(Ym,n = j|Fm,n = f) =

m−1.

Note that

Pr(Xm = i|Ym,n = j) =|SIFm,n(i, j)||SIFm,n(j)|

=

(j−1i−1

)(n−jm−i

)(n−1m−1

) .

Page 160: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

148

Hence,

H∞(Xm|Ym,n) = − log∑j∈[n]

Pr(Ym,n = j)2−H∞(Xm|Ym,n=j)

= − log∑j∈[n]

Pr(Ym,n = j) maxi∈[m]

Pr(Xm = i|Ym,n = j)

= − log n−1∑j∈[n]

maxi∈[m]

Pr(Xm = i|Ym,n = j)

= − log n−1∑j∈[n]

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

) �

Now we derive the the lower bound on z0 by proving that for ideal OPE object, there

exists a constant 0 < c < 1 such that z0 ≥ c logm for n > m2 > 1. We first prove five

technical lemmas. Note that there is a max function over i ∈ [m] in the formula of z0. In

Lemma A.1.3 we show that the maximum value can be achieved only if i ∈ [ mjn+1

, mjn+1

+ 1].

The next two lemmas, Lemma A.1.4 and Lemma A.1.5, are preparations for Lemma A.1.6.

Lemma A.1.4 gives an estimate on 1+x, which follows from Taylor expansion. Then Lemma

A.1.5 gives an estimate on(xy

), which is proved by applying Lemma A.1.4 twice. Finally, in

Lemma A.1.6, we apply the conclusions of Lemma A.1.4 and Lemma A.1.5 to prove a bound

on each term of the hypergeometric distribution, which is then simplified. Summation will

prove z0 ≥ c logm with the given conditions n > m2 and m > mc. We will use the conclusion

of Lemma A.1.7 to show that we can in fact choose mc = 1

Lemma A.1.3: Given j,m, n,(j−1i−1)(

n−jm−i)

(n−1m−1)

achieves the maximum value only if i ∈ [ mjn+1

, mjn+1

+

1].

Proof. We assume that(j−1i−1)(

n−jm−i)

(n−1m−1)

achieves the maximum value at i. Then(j−1i−2

)(n−j

m−i+1

)(n−1m−1

) ≤(j−1i−1

)(n−jm−i

)(n−1m−1

) and

(j−1i−1

)(n−jm−i

)(n−1m−1

) ≥(j−1i

)(n−j

m−i−1

)(n−1m−1

) .

Therefore,(j−1i−2

)(n−j

m−i+1

)(n−1m−1

) ≤(j−1i−1

)(n−jm−i

)(n−1m−1

) ⇒ 1

(j − i+ 1)(m− i+ 1)≤ 1

(i− 1)(n− j −m+ i)

⇒ i ≤ mj

n+ 1+ 1

Page 161: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

149

and (j−1i−1

)(n−jm−i

)(n−1m−1

) ≥(j−1i

)(n−j

m−i−1

)(n−1m−1

) ⇒ 1

(j − i)(m− i)≤ 1

i(n− j −m+ i+ 1)

⇒ i ≥ mj

n+ 1.

Hence i ∈ [ mjn+1

, mjn+1

+ 1]. �

Lemma A.1.4: For every ε > 0, there exist xε > 0 and∣∣α + e

2

∣∣ < ε such that for all

|x−1| ≥ xε,

1 + x = (e+ αx)x.

Proof. This is a consequence of Taylor expansion. We have(1 +

1

x

)x= ex ln(1+

1x) = ex(

1x− 1

2x2+o(x−2))

= e1−12x

+o(x−1) = e · e−12x

+o( 1x)

= e− e

2x+ o(x−1) = e+

1

x

(−e

2+ o(1)

). �

Note that in Lemma A.1.4, α is negative for small ε. Also, α is not a constant, but

depends on x.

Lemma A.1.5: For every ε > 0, there exist 0 < cε < 1 and yε > 0 such that for x ≥ y2

and y ≥ yε,

(exy

)y√

2πy≤(x

y

)≤eε(exy

)y√

2πy.

Proof. According to Stirling’s formula,

x! =√

2πx(xe

)xeλx

where

1

12x+ 1< λx <

1

12x.

Page 162: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

150

Hence (x

y

)=

x!

y!(x− y)!

=

√2πx

(xe

)xeλx

√2πy(y

e)yeλy

√2π(x− y)

(x−ye

)x−yeλx−y

=

√xx−y

(xy

)y (xx−y

)x−y√

2πyeλx−λy−λx−y

=

(xy

)y (xx−y

)x−y+ 12

√2πy

eλx−λy−λx−y .

We now apply Lemma A.1.4 twice. For every ε > 0 there exist xε > 0 and∣∣α + e

2

∣∣ < ε

such that for∣∣∣x−yy ∣∣∣ ≥ xε and

∣∣∣ e(x−y)αy

∣∣∣ ≥ xε,(x

x− y

)x−y+ 12

=

(1 +

y

x− y

)x−y+ 12

=

(e+

αy

x− y

) yx−y (x−y+

12)

=

(e+

αy

x− y

)y(1+ 12(x−y))

where α depends on yx−y . We further have(

e+αy

x− y

)y(1+ 12(x−y))

= ey(1+1

2(x−y))(

1 +αy

e(x− y)

)y(1+ 12(x−y))

= ey(1+1

2(x−y))(e+ α

αy

e(x− y)

) αye(x−y)y(1+

12(x−y))

= ey(1+1

2(x−y))(e+

α2y

e(x− y)

)αy2(1+ 12(x−y))

e(x−y)

.

where the second α depends on αye(x−y) . For simplicity we denote the multiplication of the

two α’s as α2. What matters here is that the two α’s are bounded. For ε > 0, there exists

x′ε > 0 such that for xy≥ x′ε,

ey < ey(1+1

2(x−y)) = ey+y

2(x−y) ≤ ey+ε

Page 163: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

151

and

e ≤ e+α2y

e(x− y)= e+

α2 yx

e(1− y

x

) ≤ e+ ε

and for x− y ≥ x′ε,xy≥ x′ε, and x ≥ y2,(− e

2− ε)

(1 + ε)

e(1− ε)≤α y

2

x

(1 + 1

2(x−y)

)e(1− y

x

) ≤ 0.

Therefore, for |x−yy| ≥ xε, | e(x−y)αy

| ≥ xε, x− y ≥ x′ε,xy≥ x′ε, and x ≥ y2.

ey(e+ ε)(− e2−ε)(1+ε)

e(1−ε) ≤(

x

x− y

)x−y+ 12

≤ ey+ε

Also, there exists x′′ε > 0 such that, for x ≥ x′′ε , y ≥ x′′ε , and x− y ≥ x′′ε ,

1− ε ≤ eλx−λy−λx−y < 1.

For xε, x′ε, and x′′ε , there exists yε > 0 such that for x ≥ y2 and y ≥ yε, all of the previous

constraints hold, i.e., |x−yy| ≥ xε, | e(x−y)αy

| ≥ xε, x−y ≥ x′ε,xy≥ x′ε, x ≥ x′′ε , y ≥ x′′ε , x−y ≥ x′′ε .

Hence for every ε > 0, there exists 0 < cε = (1 − ε)(e + ε)(− e2−ε)(1+ε)

e(1−ε) < 1 and yε > 0, such

that, for x ≥ y2 and y ≥ yε,

(exy

)y√

2πy≤(x

y

)≤eε(exy

)y√

2πy. �

Lemma A.1.6: Let 12< σ < 1, j ∈ [ n

mσ, n− n

mσ], and i ∈ [ mj

n+1, mjn+1

+ 1]. Then for every

ε > 0 there exist cε,σ,1, cε,σ,2, and mε,σ such that

cε,σ,1m− 1

2 ≤(j−1i−1

)(n−jm−i

)(n−1m−1

) ≤ cε,σ,2m− 1−σ

2 ,

for n > m2 and m ≥ mε,σ.

Proof. By Lemma A.1.5, we have the following bounds:

(1) for every ε > 0, there exists 0 < cε < 1 and yε > 0 such that for j − 1 ≥ (i− 1)2 and

i− 1 ≥ yε,

(e(j−1)i−1

)i−1√

2π(i− 1)≤(j − 1

i− 1

)≤eε(e(j−1)i−1

)i−1√

2π(i− 1),

Page 164: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

152

(2) for every ε > 0, there exists 0 < cε < 1 and yε > 0 such that for n− j ≥ (m− i)2 and

m− i ≥ yε,

(e(n−j)m−i

)m−i√

2π(m− i)≤(n− jm− i

)≤eε(e(n−j)m−i

)m−i√

2π(m− i),

(3) For every ε > 0, there exists 0 < cε < 1 and yε > 0 such that for n − 1 ≥ (m − 1)2

and m− 1 ≥ yε,

(e(n−1)m−1

)m−1√

2π(m− 1)≤(n− 1

m− 1

)≤eε(e(n−1)m−1

)m−1√

2π(m− 1).

For (1), we need to derive the condition such that j − 1 ≥ (i − 1)2 and i − 1 ≥ yε hold.

Since mjn+1≤ i ≤ mj

n+1+ 1, it implies that (i − 1)2 ≤ ( mj

n+1)2. Hence it suffices to derive the

condition such that ( mjn+1

)2 ≤ j − 1 holds. Note that

(mj

n+ 1)2 ≤ j − 1⇔ (n+ 1)2(j − 1)− (mj)2 ≥ 0,

and the dominant term in (n + 1)2(j − 1) − (mj)2 is n2j − (mj)2. Since nmσ≤ j ≤ n − n

and n > m2, we have

n2j − (mj)2 > (n−m2)j2 ≥ j2 ≥ (n

mσ)2 > (

n

m)2 > m2.

It implies that j − 1 ≥ (i− 1)2 holds for sufficiently large m. Furthermore, since

i ≥ mj

n+ 1≥ mn

(n+ 1)mσ≥ m1−σ

2,

it implies that i converges to infinity when m converges to infinity. Therefore there exists

mσ,ε,1 > 0 such that for n > m2 and m ≥ mσ,ε,1,

j − 1 ≥ (i− 1)2 and i− 1 ≥ yε.

For (2), we need to derive the condition such that n− j ≥ (m− i)2 and m− i ≥ yε hold.

Since mjn+1≤ i ≤ mj

n+1+ 1, it implies that (m − i)2 ≤ (m − mj

n+1)2. Hence it suffices to derive

the condition such that (m− mjn+1

)2 ≤ n− j holds. We have

(m− mj

n+ 1)2 ≤ n− j ⇔ (n− j)(n+ 1)2 − (m(n+ 1)−mj)2 ≥ 0.

Page 165: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

153

Since nmσ≤ j ≤ n− n

mσ, the dominant term in (n−j)(n+1)2−(m(n+1)−mj)2 is n3−(mn)2.

Since n > m2,

n3 − (mn)2 > (n−m2)n2 > n2 > m4.

It implies that n− j ≥ (m− i)2 holds for sufficiently large m. Furthermore, since

m− i ≥ m− mj

n+ 1− 1 ≥ m−

m(n− nmσ

)

n− 1 = m1−σ − 1,

it implies that m − i converges to infinity when m converges to infinity. Therefore there

exists mσ,ε,2 > 0 such that for n > m2 and m ≥ mσ,ε,2,

n− j ≥ (m− i)2 and m− i ≥ yε.

For (3), we have

n− 1 > m2 − 1 > m2 − 2m+ 1 = (m− 1)2.

Therefore there exists mε,3 > 0 such that for n > m2 and m ≥ mε,3,

n− 1 ≥ (m− 1)2 and m− 1 ≥ yε.

Therefore there exists mσ,ε,4 > 0 such that the estimation of Lemma A.1.5 can be applied

to(j−1i−1

),(n−jm−i

), and

(n−1m−1

)for j ∈ [ n

mσ, n− n

mσ], i ∈ [ mj

n+1, mjn+1

+ 1], n > m2 and m ≥ mσ,ε,4:

c2εeε

( e(j−1)i−1 )

i−1

√2π(i−1)

( e(n−j)m−i )m−i

√2π(m−i)

( e(n−1)m−1 )

m−1

√2π(m−1)

≤(j−1i−1

)(n−jm−i

)(n−1m−1

) ≤ e2ε

( e(j−1)i−1 )

i−1

√2π(i−1)

( e(n−j)m−i )m−i

√2π(m−i)

( e(n−1)m−1 )

m−1

√2π(m−1)

.

Let

T =

( e(j−1)i−1 )

i−1

√2π(i−1)

( e(n−j)m−i )m−i

√2π(m−i)

( e(n−1)m−1 )

m−1

√2π(m−1)

=1√2π·

√m− 1

(i− 1)(m− i)

(j − 1

n− 1

m− 1

i− 1

)i−1(n− jn− 1

m− 1

m− i

)m−i.

Then

c2εeε· T ≤

(j−1i−1

)(n−jm−i

)(n−1m−1

) ≤ e2ε

cε· T.

Page 166: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

154

In order to estimate T , we further let T1 =√

m−1(i−1)(m−i) , T2 = T21T22, where T21 =

(j−1n−1

m−1i−1

)i−1and T22 =

(n−jn−1

m−1m−i

)m−i. Thus it needs to estimate T1 and T2.

We first consider T1. Sincem nmσ

n+1≤ mj

n+1≤ i ≤ mj

n+1+ 1 ≤ m(n− n

mσ)

n+1+ 1, we have m1−σ

2≤

i ≤ m−m1−σ + 1. Note that (i− 1)(m− i) = −(i− m+12

)2 + (m+12

)2 −m. Consequently

m2−σ

4≤ min{(m

1−σ

2− 1)(m− m1−σ

2), (m−m1−σ)(m1−σ − 1)}

≤ (i− 1)(m− i) ≤ (m+ 1

2)2 −m =

(m− 1)2

4

for sufficiently large m. Therefore there exists mσ,ε,5 > 0 such that

2√m≤ 2√

m− 1≤√m− 1(m−1)2

4

≤ T1 =

√m− 1

(i− 1)(m− i)≤√m− 1m2−σ

4

≤ 2

m1−σ2

for m ≥ mσ,ε,5.

Now consider T2 = T21T22. Denote j = n2

+ u where −n2

+ nmσ≤ u ≤ n

2− n

mσ, then

−12+ 1mσ≤ u

n≤ 1

2− 1mσ

. Denote i = mjn

+v =(12

+ un

)m+v. Since mj

n+1≤ i = mj

n+v ≤ mj

n+1+1,

we have − 1m≤ − m

n+1≤ − mj

n+1≤ v ≤ − mj

n+1+ 1 ≤ 1. Then

T21 =

(j − 1

n− 1

m− 1

i− 1

)i−1=

(n2

+ u− 1

n− 1

m− 1(12

+ un

)m+ v − 1

)( 12+un)m+v−1

=

((1 +

1− 112+un

n− 1

)(1−

v−112+un

+ 1

m+ v−112+un

))( 12+un)m+v−1

=

(1 +

1− 112+un

n− 1

)( 12+un)m+v−1

·

(1−

v−112+un

+ 1

m+ v−112+un

)( 12+un)m+v−1

.

There exists mσ,ε,6 > 0 such that for n > m2 and m ≥ mσ,ε,6,

1− ε <

(1 +

1− 112+un

n− 1

)( 12+un)m+v−1

< 1 + ε.

Page 167: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

155

For sufficiently large m,

∣∣∣∣−m+ v−112+u

nv−112+u

n+1

∣∣∣∣ ≥ xε. Then, by Lemma A.1.4,

(1−

v−112+un

+ 1

m+ v−112+un

)( 12+un)m+v−1

=

e− α(v−112+un

+ 1)

m+ v−112+un

−v−112+u

n

+1

m+ v−112+u

n

[( 12+un)m+v−1]

=

e− α(v−112+un

+ 1)

m+ v−112+un

−v+12−un

.

Since − 1m≤ v ≤ 1 and −1

2+ 1

mσ≤ u

n≤ 1

2− 1

mσ, we have −3

2≤ −3

2+ 1

mσ≤ −v + 1

2− u

n≤

12

+ 1m− 1

mσ≤ 1

2. Hence there exists mσ,ε,7 > 0 such that for n > m2 and m ≥ mσ,ε,7,

e−32 − ε ≤ e−v+

12−un − ε ≤

e− α(v−112+un

+ 1)

m+ v−112+un

−v+12−un

≤ e−v+12−un + ε ≤ e

12 + ε.

Hence for n > m2 and m ≥ max{mσ,ε,6,mσ,ε,7},

(1− ε)(e−32 − ε) ≤ T21 ≤ (1 + ε)(e

12 + ε).

Similarly,

T22 =

(n− jn− 1

m− 1

m− i

)m−i=

(n2− u

n− 1

m− 1(12− u

n

)m− v

)( 12−un)m−v

=

((1 +

1

n− 1

)(1 +

v12−un

− 1

m− v12−un

))( 12−un)m−v

=

(1 +

1

n− 1

)( 12−un)m−v

(1 +

v12−un

− 1

m− v12−un

)( 12−un)m−v

.

Then there exists mσ,ε,8 > 0 such that for n > m2 and m ≥ mσ,ε,8,

1− ε <(

1 +1

n− 1

)( 12−un)m−v

< 1 + ε.

Page 168: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

156

For sufficiently large m,

∣∣∣∣m− v12−

un

v12−

un−1

∣∣∣∣ ≥ xε. Then, by Lemma A.1.4,

(1 +

v12−un

− 1

m− v12−un

)( 12−un)m−v

=

e+α(

v12−un

− 1)

m− v12−un

v

12−

un

−1

m− v12−

un

[( 12−un)m−v]

=

e+α(

v12−un

− 1)

m− v12−un

v− 12+un

.

Since − 1m≤ v ≤ 1 and −1

2+ 1

mσ≤ u

n≤ 1

2− 1

mσ, we have −1 ≤ −1− 1

m+ 1

mσ≤ v − 1

2+ u

n≤

1− 1mσ≤ 1. Hence there exists mσ,ε,9 > 0 such that for n > m2 and m ≥ mσ,ε,9,

e−1 − ε ≤ ev−12+un − ε ≤

e+α(

v12−un

− 1)

m− v12−un

v− 12+un

≤ ev−12+un + ε ≤ e+ ε.

Hence for n > m2 and m ≥ max{mσ,ε,8,mσ,ε,9},

(1− ε)(e−1 − ε) ≤ T22 ≤ (1 + ε)(e+ ε).

Consequently

(1− ε)2(e−32 − ε)(e−1 − ε) ≤ T2 = T21T22 ≤ (1 + ε)2(e

12 + ε)(e+ ε).

Hence for every ε > 0 and 12< σ < 1, there exists mσ,ε,10 > 0 such that

2(1− ε)2(e− 32 − ε)(e−1 − ε)√2πm

≤(j−1i−1

)(n−jm−i

)(n−1m−1

) ≤ 2(1 + ε)2(e12 + ε)(e+ ε)√

2πm1−σ,

for j ∈ [ nmσ, n− n

mσ], i ∈ [ mj

n+1, mjn+1

+ 1], n > m2, and m ≥ mσ,ε,10. Let

cε,σ,1 =2(1− ε)2(e− 3

2 − ε)(e−1 − ε)√2π

and cε,σ,2 =2(1 + ε)2(e

12 + ε)(e+ ε)√2π

.

Then for 12< σ < 1, j ∈ [ n

mσ, n− n

mσ], i ∈ [ mj

n+1, mjn+1

+ 1], n > m2, and m ≥ mσ,ε,10,

cε,σ,1m− 1

2 ≤(j−1i−1

)(n−jm−i

)(n−1m−1

) ≤ cε,σ,2m− 1−σ

2 . �

Page 169: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

157

Lemma A.1.7: For any mc > 0, there exists nc > 0 and 0 < cmc,nc < 1 such that

z0 ≥ cmc,nc logm

for m ≤ mc and n ≥ nc.

Proof. Note that(j−1i−1

)(n−jm−i

)(n−1m−1

) =

(j−1)...(j−i+1)(i−1)!

(n−j)...(n−j−m+i+1)(m−i)!

(n−1)...(n−m+1)(m−1)!

=(m− 1)!

(i− 1)!(m− i)!( jn− 1

n)...( j

n− i−1

n)n−jn...(n−j

n− m−i−1

n)

(1− 1n)...(1− m−1

n)

.

Since 1 ≤ i ≤ m ≤ mc, for ε = (12)2mc−1 there exists nc > 0 such that

(m− 1)!

(i− 1)!(m− i)!(j

n)i−1(

n− jn

)m−i − ε ≤(j−1i−1

)(n−jm−i

)(n−1m−1

) ≤ (m− 1)!

(i− 1)!(m− i)!(j

n)i−1(

n− jn

)m−i + ε

for m ≤ mc and n ≥ nc. Therefore

z0 = − log n−1∑j∈[n]

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

)= − log n−1

∑1≤j<n

4

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

) +∑

n4≤j< 3n

4

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

) +∑

3n4≤j≤n

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

)

≥ − log n−1

n2

+∑

n4≤j< 3n

4

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

)

≥ − log n−1

n2

+∑

n4≤j< 3n

4

maxi∈[m]

((m− 1)!

(i− 1)!(m− i)!(j

n)i−1(

n− jn

)m−i + ε

)= − log n−1

n2

(1 + ε) +∑

n4≤j< 3n

4

maxi∈[m]

(m− 1)!

(i− 1)!(m− i)!(j

n)i−1(

n− jn

)m−i

.

According to Lemma A.1.3, i ∈ [ mjn+1

, mjn+1

+ 1]. For n4≤ j < 3n

4, m

8≤ i < 3m

4+ 1. Hence

maxi∈[m]

(m− 1)!

(i− 1)!(m− i)!(j

n)i−1(

n− jn

)m−i ≤ (j

n+n− jn

)m−1 − (j

n)m−1(

n− jn

)m−m

≤ 1− (1

4)m−1 ≤ 1− (

1

4)mc−1.

Page 170: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

158

Consequently, we have

z0 ≥ − log n−1

n2

(1 + ε) +∑

n4≤j< 3n

4

maxi∈[m]

(m− 1)!

(i− 1)!(m− i)!(j

n)i−1(

n− jn

)m−i

≥ − log n−1

(n

2(1 + ε) +

n

2

(1− (

1

4)mc−1

))≥ − log

(1− (

1

2)2mc−1 +

ε

2

)≥ − log

(1− (

1

2)2mc

).

Let cmc,nc =− log(1−( 12 )2mc)

logmc> 0, then z0 ≥ − log

(1− (1

2)2mc

)≥ cmc,nc logmc ≥ cmc,nc logm.

We now prove the lower bound on z0 in the following Theorem A.1.8.

Theorem A.1.8: For ideal OPE object, there exists a constant 0 < c < 1 such that for

n > m2 > 1,

z0 ≥ c logm.

Proof. Note that z0 is the average min-entropy of the hypergeometric distribution. Thus

z0 < logm, so trivially we have c < 1. It remains to prove that we can in fact choose c > 0.

We first prove the bound for n > m2 and m > mc, for some mc > 0. Then we prove the

bound for m ≤ mc and n ≥ nc, for some nc > 0, which will be used to prove that we can

choose mc = 1.

Page 171: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

159

Based on Lemmas A.1.2, A.1.3, and A.1.6, for 12< σ < 1, n > m2, and m ≥ mε,σ, we

have

z0 = − log n−1∑j∈[n]

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

)≥ − log n−1

2n

mσ+

∑j∈( n

mσ,n− n

mσ]

maxi∈[m]

(j−1i−1

)(n−jm−i

)(n−1m−1

)

≥ − log n−1(

2n

mσ+ (n− 2n

mσ)cε,σ,2m

− 1−σ2

)= − log

(2

mσ+ (1− 2

mσ)cε,σ,2m

− 1−σ2

)=

1− σ2

logm− log

(2m

1−3σ2 + (1− 2

mσ)cε,σ,2

).

Note that cε,σ,2 = 2(1+ε)2(e12+ε)(e+ε)√2π

in the proof of Lemma A.1.6. It implies that

limm→∞

log

(2m

1−3σ2 + (1− 2

mσ)cε,σ,2

)= log cε,σ,2.

Therefore for cσ = 1−σ4

, there exists mc > mε,σ such that

1− σ4

logm− log

(2m

1−3σ2 + (1− 2

mσ)cε,σ,2

)> 0

for m > mc. Hence

z0 ≥1− σ

2logm− log

(2m

1−3σ2 + (1− 2

mσ)cε,σ,2

)=

1− σ4

logm+

(1− σ

4logm− log

(2m

1−3σ2 + (1− 2

mσ)cε,σ,2

))≥ cσ logm

for n > m2 and m > mc.

Lemma A.1.7 allows us to conclude the proof of Theorem A.1.8. Note that the set

{(m,n) | 1 < m ≤ mc,m2 < n < nc} is finite and that z0

logm> 0 for (m,n) in that set. Since

we have already obtained two nonzero lower bounds on z0logm

, the first for the case n > m2

and m > mc and the second for the case n ≥ nc and 1 < m ≤ mc, we can choose c to be the

minimum of

Page 172: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

160

1. the bound in the first case,

2. the bound in the second case,

3. the set of values z0logm

for (m,n) in the finite set of remaining values.

In other words, we can choose mc = 1, completing the proof of Theorem A.1.8. �

According to information theory, we have the following Corollary A.1.9 based on Theorem

A.1.8. It gives an upper bound on the probability for the adversary to reverse x from the

ciphertext E∗(x, f).

Corollary A.1.9: Let f be chosen uniformly randomly from SIFm,n and let x be chosen

uniformly randomly from [m]. Let E0 denote the event that the adversary obtains x from

the ciphertext E∗(x, f). Then for n > m2 > 1,

Pr (E0) ≤ 2−c logm = m−c.

Proof. According to Theorem A.1.8, there exists 0 < c < 1 such that for n > m2 > 1,

z0 ≥ c logm. It follows from information theory that the probability for the adversary to

recover x from the ciphertext E∗(x, f) is 2−c logm = m−c. �

The General Case

We now consider the case of h known plaintext attacks with the set of plaintext/ciphertext

pairs KP = {(xi, yi)}hi=1, where yi = E∗(xi, f), 1 ≤ i ≤ h. In this case, the plaintexts

xi will cut the domain into h + 1 segments [1, x1), (x1, x2), ..., (xh−1, xh), (xh,m], and the

ciphertexts yi will similarly cut the range into h + 1 segments [1, y1), (y1, y2), ..., (yh−1, yh),

(yh, n]. Since the encryption algorithm E∗ is order-preserving, it encrypts the plaintexts from

the sub-domains [xi + 1, xi+1 − 1] to the sub-ranges [yi + 1, yi+1 − 1], where 0 ≤ i ≤ h and

x0 = y0 = 0, xh+1 = m + 1, yh+1 = n + 1. We will proceed by applying Corollary A.1.9 to

each pair of [xi + 1, xi+1 − 1] and [yi + 1, yi+1 − 1], 0 ≤ i ≤ h.

In order to do so, we first give the following lemma. It analyzes the relationship of the

distance between a pair of plaintexts x and x′ with the distance between the corresponding

Page 173: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

161

pair of ciphertexts E∗(x, f) and E∗(x′, f), in particular that for n ≥ m3, E∗(x′, f)−E∗(x, f)−1

is greater than (x′ − x− 1)2 with a dominant probability.

Lemma A.1.10: Suppose that n ≥ m3. Let x ∈ [m] and 1 ≤ δ ≤ m − x − 1. Let

y = E∗(x, f) and choose δ′ to satisfy y+δ′+1 = E∗(x+δ+1, f), where f is chosen uniformly

randomly from SIFm,n. Then

Pr(δ′ ≤ δ2) ≤ 1

m2.

Proof. Note that there are(y − 1

x− 1

)(δ′

δ

)(n− y − δ′ − 1

m− x− δ − 1

)functions in SIFm,n that maps x to y and x+ δ + 1 to y + δ′ + 1. Therefore

Pr(δ′ ≤ δ2) =δ2∑δ′=δ

n−m−δ′+x+δ∑y=x

(y−1x−1

)(δ′

δ

)(n−y−δ′−1m−x−δ−1

)(nm

)≤ n

δ2∑δ′=δ

(δ′

δ

)(n−δ′−2m−δ−2

)(nm

)For δ ≤ δ′ ≤ δ2, (

δ′

δ

)≤(δ2

δ

)≤ δ2

δ

δ2 − 1

δ − 1

δ2 − 2

δ − 2δ2(δ−3)

≤ δ(δ + 1)(δ + 3)m2(δ−3).

Also,

n

(n− δ′ − 2

m− δ − 2

)(n

m

)−1= n

(n− δ′ − 2) · · · (n−m− δ′ + δ + 1)

(m− δ − 2)!

m!

n · · · (n−m+ 1)

≤ nm(m− 1) · · · (m− δ − 1)

n(n− 1) · · · (n− δ − 1)· (n− δ′ − 2) · · · (n−m− δ′ + δ + 1)

(n− δ − 2) · · · (n−m+ 1)

≤ m

(1

m2· · · 1

m2· 1 · · · 1

)= m−2δ−1.

Page 174: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

162

Then we have

Pr(δ′ ≤ δ2) ≤δ2∑δ′=δ

δ(δ + 1)(δ + 3)m−7

≤ δ(δ + 1)((δ + 3)(δ2 − δ)

)m−7

<δ(δ + 1)(δ + 1)3

m7

<m5

m7=

1

m2. �

Now we prove the generalization of Corollary A.1.9 to the case of arbitrary h.

Proposition A.1.11: Let f be chosen uniformly randomly from SIFm,n, where n ≥ m3.

Assume that the adversary knows h plaintexts/ciphertexts pairs E∗(xi, f), 1 ≤ i ≤ h. Let x

be chosen uniformly randomly from [m]∗ = [m] − {xi}hi=1 and let Eh denote the event that

the adversary obtains x from the ciphertext E∗(x, f) based on KP . Then

Pr(Eh) ≤(h+ 1

m− h

)c+

1

m2.

Proof. Without loss of generality, assume that x0 = 0 < x1 < · · · < xh < xh+1 = m+ 1.

Let Dj = [xj−1 + 1, xj − 1], 1 ≤ j ≤ h+ 1. Then⋃1≤j≤h+1

Dj = [m]∗ = [m]− {xi}hi=1.

Let δj = |Dj| = xj − xj−1 − 1 and δ′j = E∗(xj, f) − E∗(xj−1, f) − 1, for 1 ≤ j ≤ h + 1. By

Corollary A.1.9 and Lemma A.1.10, we know that

Pr(Eh|x ∈ Dj) = Pr(E|x ∈ Dj, δ′j > δ2j ) Pr(δ′j > δ2j ) + Pr(E|x ∈ Dj, δ

′j ≤ δ2j ) Pr(δ′j ≤ δ2j )

≤ Pr(E|x ∈ Dj, δ′j > δ2j ) + Pr(δ′j ≤ δ2j )

≤ δ−cj +m−2.

Since∑

1≤j≤h+1 δj = m− h, we have∑1≤j≤h+1

δ1−cj

m− h≤

∑1≤j≤h+1

((m− h)/(h+ 1))1−c

m− h(1)

=

(h+ 1

m− h

)c.

Page 175: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

163

Thus, for n ≥ m3, we have

Pr(Eh) =∑

1≤j≤h+1

Pr(E|x ∈ Dj) Pr(x ∈ Dj) ≤∑

1≤j≤h+1

(δ−cj +m−2)δj

m− h

=∑

1≤j≤h+1

δ1−cj

m− h+

∑1≤j≤h+1

δjm2(m− h)

≤(h+ 1

m− h

)c+

1

m2. �

Remark A.1.1: Note that if h = o(mε), 0 < ε < 1, the probability Pr(Eh) =(h+1m−h

)c+

1m2 ≤

(mε+1m−mε

)c+ 1

m2 ≈ 1m(1−ε)c + 1

m2 is a negligible function of the secure parameter logm.

Hence, it implies that in the case n = m3, the probability for the adversary to fully recover

the plaintext is a negligible function of the secure parameter logm if the number of known

plaintext/ciphertext pairs satisfies h = o(mε), where 0 < ε < 1.

Remark A.1.2: Note that the inequality (1) in the proof of Proposition A.1.11 becomes

an equality if and only if δj = m−hh+1

, 1 ≤ j ≤ h + 1. This implies that when the known

plaintexts are evenly distributed in the domain, the attack is most effective. Consider the

following two types of known plaintext attacks. In the first case, the adversary knows the

plaintext/ciphertext pair (1, E∗(1, f)), so by Corollary A.1.9, we have Pr(E1) ≤ 1(m−1)c .

In the second case, the adversary knows the plaintext/ciphertext pair (m2, E∗(m

2, f)), so by

Proposition A.1.11, we have Pr(E1) ≤(

2m−1

)c+ 1

m2 . Since 1(m−1)c ≤

(2

m−1

)c+ 1

m2 , this implies

that the second attack is more effective.

Remark A.1.3: Note that the bound given in Proposition A.1.11 for h = 0 is asymp-

totically identical to that given in Corollary A.1.9. The bound given in Proposition A.1.11

equals 1 when h reaches m2

. Actually, if the adversary knows the plaintext/ciphertext pairs

(xi, f(xi)) where xi = 2i− 1, 1 ≤ i ≤ m/2, then the adversary can reverse any newly given

ciphertext.

Now we show the corresponding lower bound on zh for ideal OPE object in the following

theorem.

Page 176: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

164

Theorem A.1.12: For ideal OPE object, there exists a constant 0 < c < 1 such that

for n ≥ m3 > 1,

zh ≥ c logm− hh+ 1

− 1

(ln 2)m2−c .

Proof. Since it has been proved in Proposition A.1.11 that Pr(Eh) ≤(h+1m−h

)c+ 1

m2 , we

have

zh = log1

Pr(Eh)

≥ log1(

h+1m−h

)c+ 1

m2

= c logm− hh+ 1

− log(1 +

(m−hh+1

)cm2

)

≥ c logm− hh+ 1

− log(1 +1

m2−c )

≥ c logm− hh+ 1

− 1

(ln 2)m2−c . (2)

The correctness of inequality (2) is based on the fact that xln 2≥ log(1 + x) for x > 0 since

ddx

( xln 2− log(1 + x)) = 1

ln 2− 1

(ln 2)(1+x)= x

(ln 2)(1+x)> 0 for x > 0. �

Numerically compute the value of c

Here we include a graph showing numerically computed values of c′ = z0logm

as a function of

m. We include the cases n = m2 and n = m3. These estimates translate into estimates for

zh, the number of bits of information that are guaranteed to remain secret from the adver-

sary in the case of an attack with h known plaintext/ciphertext pairs. The corresponding

probability for the adversary to recover the plaintext from the ciphertext without any known

plaintext/ciphertext pairs is(h+1m−h

)c′+ 1

m2 .

As can be seen from Figure A.1, for 20 ≤ m ≤ 500, the value of c′ is well over 0.4,

indicating that more than 40% of the bits of a plaintext are protected from the adversary,

rendering it unlikely for the adversary to recover the complete plaintext despite the order

preserving nature of the encryption scheme. A more precise analysis of the values of c′ for

Page 177: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

165

Figure A.1. Numerically Computed c′ = z0/logm Against m.

large m would greatly enhance our understanding of the security of the algorithm for typical

values of m, such as m = 21024. We have proved in Theorem A.1.8 that c′ → c as m → ∞

for both n = m2 and n = m3. We conjecture that c ≈ 0.5 in both cases.

A.2 Security Proof for PPE Schemes

Existing cryptographic security proofs for PPE schemes only reduce the security of real PPE

schemes to the security of the ideal PPE object by showing that they are computationally

indistinguishable. However it is not a complete security proof since the security of the ideal

PPE object is unknown and there has been no security analysis in the literature to show its

Page 178: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

166

security level. In this section, we complete the existing security proof by proving that the

ideal PPE object is secure under IND-PCPA.

To prove that the ideal PPE object is secure under IND-PCPA, we need to show the

number of the prefix-preserving functions mapping x0i to E∗(xbi , k) equals to that of the

prefix-preserving functions mapping x1i to E∗(xbi , k), where (x0i , x1i ) are the plaintext pairs

the adversary queries, 1 ≤ i ≤ h. In other words, there is no bias for the adversary’s guess.

However, the proof is not straightforward because it needs to use the prefix-preserving prop-

erty to count the number of prefix-preserving functions mapping x0i (resp. x1i ) to E∗(xbi , k),

where x0i , x1i , and E∗(xbi , k) are indeterminates, 1 ≤ i ≤ h.

To overcome the difficulties, we represent the prefix-preserving function by the tree-based

function. The tree-based function consists of a plaintext tree and a ciphertext tree. The

plaintext tree is a complete binary tree. Each edge connecting a parent node to its left child

node is labeled by 0, and each edge connecting a parent node to its right child node is labeled

by 1. Each leaf node in the plaintext tree is labeled by the binary string composed of the

labels of the edges from the root to itself (the label represents the plaintext string). The

ciphertext tree is the same as the plaintext tree except for its labels. Each edge connecting

a parent node to its left child node could be labeled by 0 or 1. If it is labeled by 0 (resp.

1), then the corresponding edge connecting the parent node to its right child node must

be labeled by 1 (resp. 0). A tree-based function maps the i-th leaf node in the plaintext

tree to the i-th leaf node in the cipertext tree. It implies that the labels of each path in the

ciphertext tree (from the root node to the leaf node) represent the ciphertext of the plaintext

represented by the corresponding path in the plaintext tree. In other words, the label of the

i-th leaf node in the cipertext tree represents the cipertext of the label of the i-th leaf node

in the plaintext tree.

Once the prefix-preserving function is represented by the tree-based function, it suffices

to show that the number of the tree-based functions mapping x0i to E∗(xbi , k) equals to that

of the tree-based functions mapping x1i to E∗(xbi , k), 1 ≤ i ≤ h. An important observation is

that given h plaintext ciphertext pairs, some labels of the edges in the ciphertext tree will be

Page 179: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

167

determined while others will not. Also, the number of the undetermined labels of the edges

in the ciphertext tree decides the number of the tree-based functions. Therefore the security

proof can be reduced to show that the number of the undetermined labels of the edges in the

ciphertext tree given (x0i , E∗(xbi , k)) equals to that of the undetermined labels of the edges in

the ciphertext tree given (x1i , E∗(xbi , k)), 1 ≤ i ≤ h. We use mathematical induction on h to

prove the equality of these two numbers.

A.2.1 Tree-Based Function Definition

Before defining the tree-based function, we first define some preliminary concepts. Then,

the tree-based function is formally defined in Definition A.2.3.

Definition A.2.1: Let T = (V T , ET ) be a tree where V T denotes the set of nodes and

ET denotes the set of edges. The nodes in V T can be partitioned into the set of internal

nodes V TI and the set of leave nodes V T

L , where V T = V TI

⋃V TL and V T

I

⋂V TL = ∅.

Let vL,Ti denote the i-th leaf node in T where the leave nodes are indexed from the left

most leaf node (the first) to the right most leaf node (the |V TL |-th), 1 ≤ i ≤ |V T

L |. For v ∈ V TL

with depth n, let P (v) denote the path from the root to v and P (v)[1] · · ·P (v)[n+ 1] denote

the nodes on the path, where P (v)[1] is the root, P (v)[n+ 1] = v, and P (v)[2], · · · , P (v)[n]

are internal nodes connecting root and v. Let PI(v) = {P (v)[i] | 1 ≤ i ≤ n} denote the

set of internal nodes on the path P (v), PL(v) = {v} denote the set of leaf node in the path

P (v), and PE(v) = {P (v)[i]P (v)[i + 1] | 1 ≤ i ≤ n} denote the set of edges on the path

P (v). �

In the tree-based function, the domain of the plaintexts and the range of the ciphertexts

are two labeled trees. The labeling rules (defined in Definition A.2.2) can guarantee the

prefix-preserving property of the tree-based function. In Definition A.2.2, we first define the

internal nodes labeled (INL) tree where the internal nodes are labeled with 0 or 1, and define

the nodes and edges labeled (NEL) tree where the labels are extended from the internal nodes

to the edges and leave nodes.

Page 180: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

168

Definition A.2.2 (INL and NEL trees): Internal nodes labeled (INL) tree is defined to

be a pair (T,L), where T = (V T , ET ) is a tree and

L : V TI → {0, 1}

is a label function over internal nodes, which is called INL function. Given an INL tree

(T,L), it uniquely defines the nodes and edges labeled (NEL) tree (T,L∗), where the NEL

function

L∗ : V T⋃

ET → {0, 1}∗

is defined by the following rules.

(1) For v ∈ V TI , L∗(v) , L(v).

(2) Let e ∈ ET where e = v1ev2e and v1e, v2e denote the two endpoints of e. Without loss of

generality, assume that v1e is the parent node and v2e is the child node. Then L∗(e) , L(v1e)

if v2e is the left child node of v1e; L∗(e) , 1⊕ L(v1e) if v2e is the right child node of v1e.

(3) For v ∈ V TL with depth n, L∗(v) is a string of n bits. Let PE(v) = {P (v)[i]P (v)[i +

1] | 1 ≤ i ≤ n} denote the set of edges on the path P (v), where P (v)[1] is the root,

P (v)[n+ 1] = v, and P (v)[2], · · · , P (v)[n] are internal nodes connecting root and v. Then

L∗(v) , L∗(P (v)[1]P (v)[2]) · · · L∗(P (v)[n]P (v)[n+ 1]). �

Now we are ready to define the tree-based function, which is given in Definition A.2.3.

The tree-based function is defined with respect to two NEL trees, which are called the

plaintext tree and the ciphertext tree, respectively. It maps the label of the i-th leaf node

in the plaintext tree to the i-th leaf node in the ciphertext tree.

Definition A.2.3 (tree-based function): The tree-based function is defined with respect

to two NEL trees: the plaintext tree PTl = (TPTl ,L∗PTl) and the ciphertext tree CTl =

(TCTl ,L∗CTl), where TPTl and TCTl are two complete binary trees with heights l. In the

plaintext tree PTl, the INL function LPTl(v) ≡ 0 for any internal node v ∈ VTPTlI . But

in the ciphertext tree CTl, the INL function LCTl(v) could be 0 or 1 for any internal node

Page 181: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

169

v ∈ V TCTlI . The INL functions LPTl and LCTl uniquely define the NEL functions L∗PTl and

L∗CTl following the rules defined in Definition A.2.2.

Given PTl = (TPTl ,L∗PTl) and CTl = (TCTl ,L∗CTl), we define the corresponding tree-based

function

fPTl,CTl : {0, 1}l → {0, 1}l

fPTl,CTl(L∗PTl(vL,TPTli )) , L∗CTl(v

L,TCTli ),

where vL,TPTli and v

L,TCTli denote the i-th leave nodes in the plaintext tree and ciphertext

tree, respectively, 1 ≤ i ≤ 2l. Let TBFl denote the set of all tree-based functions, i.e.,

TBFl = {fPTl,CTl | LCTl : VTCTlI → {0, 1}}. �

Remark A.2.1: In the definition of the tree-based function fPTl,CTl , the plaintext tree

PTl is fixed since LPTl is fixed; but the ciphertext tree is not fixed. Since LCTl uniquely

defines L∗CTl , it also determines the ciphertext tree CTl. Therefore, the INL function of

ciphertext tree LCTl uniquely determines the tree-based function fPTl,CTl .

We show the equivalence of the tree-based function and the prefix-preserving function in

Proposition A.2.1.

Proposition A.2.1: TBFl = F PPE{0,1}l,{0,1}l .

Proof. First we show that TBFl ⊆ F PPE{0,1}l,{0,1}l . For any x1, x2 ∈ {0, 1}l, there exist

vL,TPTj1, vL,TPTj2

∈ V TPTL such that L∗PT (vL,TPTj1

) = x1 and L∗PT (vL,TPTj2) = x2. According to the

definition of L∗PT , the paths P (vL,TPTj1) and P (vL,TPTj2

) share |LCP (x1, x2)| many common

edges. Therefore, on the ciphertext tree, the paths P (vL,TCTj1) and P (vL,TCTj2

) also share

|LCP (x1, x2)| many common edges. For any tree-based function fPT,CT ∈ TBFl, we have

fPT,CT (x1) = fPT,CT (L∗PT (vL,TPTj1)) , L∗CT (vL,TCTj1

) and

fPT,CT (x2) = fPT,CT (L∗PT (vL,TPTj2)) , L∗CT (vL,TCTj2

).

Hence, |LCP (fPT,CT (x1), fPT,CT (x2))| = |LCP (x1, x2)| according to the definition of L∗CT .

It implies that fPT,CT ∈ F PPE{0,1}l,{0,1}l . So TBFl ⊆ F PPE

{0,1}l,{0,1}l . Second, according to the

Page 182: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

170

definition of TBFl and the cardinality of F PPE{0,1}l,{0,1}l computed in Lemma 6.1.1,

|TBFl| = |{LCT : V TCTI → {0, 1}}| = 2|V

TCTI | = 22l−1 = |F PPE

{0,1}l,{0,1}l |.

Since TBFl ⊆ F PPE{0,1}l,{0,1}l and |TBFl| = |F PPE

{0,1}l,{0,1}l |, consequently TBFl = F PPE{0,1}l,{0,1}l . �

Remark A.2.2: According to the proof of Proposition A.2.1, the prefix-preserving

property of the tree-based function can be geometrically interpreted as follows. For any

x ∈ {0, 1}l, it corresponds to the j-th leaf node vL,TPTlj in the plaintext tree V

TPTlL . Actually

j = B(x) + 1 where B(x) denotes the binary number of x. For x1, x2 ∈ {0, 1}l, the paths

P (vL,TPTlj1

) and P (vL,TPTlj2

) on the plaintext tree share |LCP (x1, x2)| many common edges.

The tree-based function has the prefix-preserving property such that the paths P (vL,TCTlj1

)

and P (vL,TCTlj2

) on the ciphertext tree also share |LCP (x1, x2)| many common edges.

Based on Proposition A.2.1, we give an alternative definition for the ideal PPE object in

Definition A.2.4.

Definition A.2.4 (alternative definition of ideal PPE object): It has the same definition

as that of the original ideal PPE object except that K∗ uniformly randomly selects f from

TBFl instead of F PPE{0,1}l,{0,1}l . �

A.2.2 Security Proof

Now we prove that the ideal PPE object is secure under IND-PCPA. It also implies that

the real PPE schemes, which are computationally indistinguishable to the ideal PPE object,

achieve the highest security notion for PPE. Essentially, we need to show that in the security

notion IND-PCPA, the number of the tree-based functions mapping x0i to E∗(xbi , k) equals

to that of the tree-based functions mapping x1i to E∗(xbi , k), where (x0i , x1i ) are the queried

plaintexts pairs, 1 ≤ i ≤ h. Since LCTl uniquely determines the tree-based function, in

order to count those numbers, we need to consider the effect towards LCTl (partial mapping

will be determined) when given the plaintext ciphertext pairs (x0i , E∗(xbi , k))/(x1i , E∗(xbi , k)),

1 ≤ i ≤ h.

Page 183: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

171

Lemma A.2.2: Given h plaintext ciphertext pairs (xi, yi) of fPTl,CTl , then the labels

of the internal nodes on h paths P (vji) are determined, where vji is decided by xi and the

labels are decided by yi, 1 ≤ i ≤ h.

Proof. Consider plaintext ciphertext pair (xi, yi) ∈ {0, 1}l×{0, 1}l such that fPTl,CTl(xi) =

yi. We assume that xi = L∗PTl(vL,TPTlji

) where vL,TPTlji

denotes the ji-th leaf node in the plain-

text tree, and yi = yi1 · · · yil, yiu ∈ {0, 1}, 1 ≤ u ≤ l. According to the definition of tree-based

function,

yi = fPTl,CTl(xi) = fPTl,CTl(L∗PTl(vL,TPTlji

)) = L∗CTl(vL,TCTlji

).

Therefore, L∗CTl(P (vL,TCTlji

)[u]P (vL,TCTlji

)[u+1]) = yiu for 1 ≤ u ≤ l according to the definition

of L∗CTl . It implies that the labels of the edges on the path P (vL,TCTlji

) are determined by yi.

Since the labels of the internal nodes on the path and the labels of the edges on the same path

can be mutually decided, the labels of the internal nodes on the path P (vL,TCTlji

) are decided

by yi. Hence, given the plaintext ciphertext pairs (xi, yi) where xi = L∗PTl(vL,TPTlji

), for the

INL function LCTl , the labels of the internal nodes on the path P (vL,TCTlji

) are determined,

1 ≤ i ≤ h. �

Consider the adversary counting the number of tree-based functions mapping x0i /x1i to

E∗(xbi , k), 1 ≤ i ≤ h. Since the tree-based function is uniquely determined by the INL

function LCTl (Remark A.2.1), it is equivalent to count the number of INL functions. The

important observation is: according to Lemma A.2.2, the labels of the internal nodes on

the corresponding h paths are determined. Therefore it suffices to count the rest undeter-

mined labels since they decides the number of INL functions. Following such idea, we use

mathematical induction on h to prove that the two numbers are identical in Lemma A.2.3.

Lemma A.2.3: The number of the tree-based functions mapping x0i to E∗(xbi , k)) equals

to that of the tree-based functions mapping x1i to E∗(xbi , k)), 1 ≤ i ≤ h.

Proof. Let x0i = L∗PTl(vL,TPTlj0i

) where vL,TPTlj0i

denotes the j0i -th leaf node in the plaintext

tree, and x1i = L∗PTl(vL,TPTlj1i

) where vL,TPTlj1i

denotes the j1i -th leaf node in the plaintext tree,

1 ≤ i ≤ h. For tree-based functions mapping x0i to E∗(xbi , k), the labels of the internal nodes

Page 184: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

172

on the path P (vL,TCTlj0i

) in the ciphertext tree are determined; for tree-based functions mapping

x1i to E∗(xbi , k), the labels of the internal nodes on the path P (vL,TCTlj1i

) in the ciphertext tree

are determined, 1 ≤ i ≤ h (Lemma A.2.2). Hence it suffices to prove that the determined

labels of the internal nodes in that two ciphertext trees are assigned consistent values and

the number of the undetermined labels of the internal nodes in that two ciphertext trees are

identical, i.e.

| ∪1≤i≤h PI(vL,TCTlj0i

)| = | ∪1≤i≤h PI(vL,TCTlj1i

)| (6.1)

where PI(v) (defined in Definition A.2.1) denotes the set of internal nodes on the path P (v).

We use mathematical induction on h to prove it. For h = 1, it is obvious that the labels in

PI(vL,TCTlj01

)/PI(vL,TCTlj11

) are assigned consistent values with respect to E∗(xb1, k) according to

the proof of Lemma A.2.2. Also |PI(vL,TCTlj01

)| = l−1 = |PI(vL,TCTlj11

)|. So we assume that the

induction assumption holds for h < h′ and consider the situation for h = h′. According to the

inductional assumption, the labels in ∪1≤i≤h′−1PI(vL,TCTlj0i

)/∪1≤i≤h′−1PI(vL,TCTlj1i

) are assigned

consistent values. Also, |∪1≤i≤h′−1PI(vL,TCTlj0i

)| = |∪1≤i≤h′−1PI(vL,TCTlj1i

)| and |PI(vL,TCTlj0h′

)| =

|PI(vL,TCTlj1h′

)|. Since (x0i , x1i ) ∈ PPPh′ , LCP (x0h′ , x

0i ) = LCP (x1h′ , x

1i ) for 1 ≤ i ≤ h′ − 1

according to the definition of PPPh′ . Note that x0i = L∗PTl(vL,TPTlj0i

) and x1i = L∗PTl(vL,TPTlj1i

) for

1 ≤ i ≤ h′, we have |PI(vL,TPTlj0h′

)∩PI(vL,TPTlj0i

)| = |PI(vL,TPTlj1h′

)∩PI(vL,TPTlj1i

)| for 1 ≤ i ≤ h′−1

according to the definition of NEL function L∗PTl . Therefore

|PI(vL,TCTlj0h′

) ∩ PI(vL,TCTlj0i

)| = |PI(vL,TCTlj1h′

) ∩ PI(vL,TCTlj1i

)| (6.2)

for 1 ≤ i ≤ h′ − 1 according to the conclusions in Remark A.2.2. Without loss of generality,

we assume b = 0. So the labels in PI(vL,TCTlj0h′

)/PI(vL,TCTlj0i

) are assigned consistent values for

1 ≤ i ≤ h′ − 1, i.e., the labels in PI(vL,TCTlj0h′

) ∩ PI(vL,TCTlj0i

) are assigned the same values no

matter with respect to E∗(xbh′ , k) or E∗(xbi , k) for 1 ≤ i ≤ h′ − 1. Consequently, the labels in

PI(vL,TCTlj1h′

) ∩ PI(vL,TCTlj1i

) are assigned the same values no matter with respect to E∗(xbh′ , k)

or E∗(xbi , k) for 1 ≤ i ≤ h′ − 1 based on the proof in Lemma A.2.2 and (6.2), which implies

that the labels in PI(vL,TCTlj1h′

)/PI(vL,TCTlj1i

) are assigned consistent values for 1 ≤ i ≤ h′ − 1.

Therefore, the labels in ∪1≤i≤h′PI(vL,TCTlj0i

)/∪1≤i≤h′PI(vL,TCTlj1i

) are assigned consistent values.

Page 185: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

173

Also, let 1 ≤ i0 ≤ h′ − 1 such that

|PI(vL,TCTlj0h′

) ∩ PI(vL,TCTlj0i0

)| = max1≤i≤h′−1

{|PI(vL,TCTlj0h′

) ∩ PI(vL,TCTlj0i0

)|}

and

|PI(vL,TCTlj1h′

) ∩ PI(vL,TCTlj1i0

)| = max1≤i≤h′−1

{|PI(vL,TCTlj1h′

) ∩ PI(vL,TCTlj1i0

)|}.

Then

| ∪1≤i≤h′ PI(vL,TCTlj0i

)| = |(∪1≤i≤h′−1PI(vL,TCTlj0i

)) ∪ PI(vL,TCTlj0h′

)|

= | ∪1≤i≤h′−1 PI(vL,TCTlj0i

)|+ |PI(vL,TCTlj0h′

)| − |PI(vL,TCTlj0h′

) ∩ PI(vL,TCTlj0i0

)|

= | ∪1≤i≤h′−1 PI(vL,TCTlj1i

)|+ |PI(vL,TCTlj1h′

)| − |PI(vL,TCTlj1h′

) ∩ PI(vL,TCTlj1i0

)|

= |(∪1≤i≤h′−1PI(vL,TCTlj1i

)) ∪ PI(vL,TCTlj1h′

)| = | ∪1≤i≤h′ PI(vL,TCTlj1i

)|.

It completes the induction. �

In Theorem A.2.4, we prove the security of the ideal PPE object.

Theorem A.2.4: The ideal PPE object SE∗ is secure under IND-PCPA.

Proof. According to Proposition A.2.1 and Lemma A.2.3, the number of the prefix-

preserving functions mapping x0i to E∗(xbi , k)) equals to that of the prefix-preserving functions

mapping x1i to E∗(xbi , k)), 1 ≤ i ≤ h. Therefore Pr(ExpIND-PCPA-bSE∗,A = 1) = 1

2for b = 0, 1.

Hence,

AdvIND-PCPASE∗,A = Pr(ExpIND-PCPA-1

SE∗,A = 1)− Pr(ExpIND-PCPA-0SE∗,A = 1) = 0,

which implies that the ideal PPE object SE∗ is secure under IND-PCPA. �

Page 186: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

174

REFERENCES

[1] M. Abd-El-Malek,‎W.V.‎Courtright,‎C.‎Cranor,‎et‎al.,‎“Ursa‎minor:‎versatile‎cluster-based

storage,” in USENIX Conference on File and Storage Technology, pp. 13-16, 2005.

[2] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in ACM SIGMOD

International Conference on Management of Data, pp. 439-450, 2000.

[3] R.‎Agrawal,‎ J.‎Kiernan,‎R.‎Stikant,‎ and‎Y.‎Xu,‎ “Order-preserving encryption for numeric

data,”‎in SIGMOD’04, pp. 563-574, 2004.

[4] G.‎Amanatidis,‎A.‎Boldyreva‎and‎A.‎O’Neill,‎“Provably-Secure Schemes for Basic Query

Support in Outsourced Databases,” in Working Conference on Data and Applications

Security 2007 Proceedings, Lecture Notes in Computer Science, Vol. 4602, pp. 14-30,

2007.

[5] F. Armknecht, D. Augot, L. Perret, A. Sadeghi, “On‎Constructing‎Homomorphic‎Encryption‎

Schemes from Coding Theory,” in IMA Int. Conf, pp. 23-40, 2011.

[6] G.‎Bebek,‎“Anti-tamper database research: Inference control techniques,” Technical Report

EECS 433 Final Report, Case Western Reserve University, 2002.

[7] M.‎Bellare,‎ T.‎Kohno,‎ and‎C.‎Namprempre,‎ “Authenticated‎ encryption‎ in‎ SSH:‎ provably‎

fixing‎ the‎ SSH‎ binary‎ packet‎ protocol,”‎ in‎ Proceedings of the 9th ACM conference on

Computer and Communications Security (CCS-02), pp. 1-11, 2002.

[8] M. Bellare, A. Boldyreva,‎ and‎ A.‎ O'Neill,‎ “Deterministic‎ and‎ efficiently‎ searchable‎

encryption,”‎in‎CRYPTO'07, pp. 535-552, 2007.

[9] M.‎ Bellare,‎ M.‎ Fischlin,‎ A.‎ O'Neill,‎ and‎ T.‎ Ristenpart,‎ “Deterministic‎ encryption:‎

Definitional equivalences and constructions without random oracles,”‎ in‎CRYPTO'08, pp.

360-378, 2008.

[10] E. Bertino, “Database security - concepts, approaches, and challenges,” in Dependable and

Secure Computing, IEEE Transactions on, Vol.2, pp. 2-19, 2005.

[11] A.‎Boldyreva,‎S.‎Fehr,‎and‎A.‎O'Neill,‎“On‎notions‎of‎security‎for‎deterministic encryption,

and‎efficient‎constructions‎without‎random‎oracles,”‎in‎CRYPTO'08, pp. 335-359, 2008.

[12] A.‎Boldyreva,‎N.‎Chenette,‎Y.‎Lee,‎A.‎O'Neill,‎“Order-Preserving Symmetric Encryption,”

in Advances in Cryptology - Eurocrypt'09, 2009.

[13] A.‎Boldyreva,‎N.‎Chenette,‎A.‎O'Neill,‎“Order-Preserving Encryption Revisited: Improved

Security Analysis and Alternative Solutions,” in Advances in Cryptology - Crypt'11, 2011.

Page 187: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

175

[14] D.‎Boneh,‎E.‎Goh,‎K‎Nissim,‎ “Evaluating‎2-DNF Formulas on Ciphertexts,” in TCC, pp.

325-341, 2005.

[15] D.‎Boneh‎ and‎B.‎Waters,‎ “Conjunctive,‎ subset,‎ and‎ range‎ queries‎ on‎ encrypted‎ data,”‎ in‎

TCC, pp. 535-554, 2007.

[16] Z. Brakerski, C. Gentry, V. Vaikuntanathan, “Fully Homomorphic Encryption without

Bootstrapping,”‎ in Electronic Colloquium on Computational Complexity (ECCC) 18: 111,

2011.

[17] S.‎Bulygin,‎T.‎Rai,‎“Countering‎Chosen-Ciphertext Attacks against Noncommutative Polly

Cracker Cryptosystems,” Special Semester on Gröbner Bases, Linz, Austria, 2006.

[18] R. Cramer, I. Damgård, U. Maurer, “General‎ secure‎ multi-party computation from any

linear secret-sharing scheme,” in Advances in Cryptology - EUROCRYPT 2000, Lecture

Notes in Computer Science, Springer-Verlag, Vol. 1807, pp. 316-334, 2000.

[19] R.‎ Cramer,‎ I.‎ Damgard,‎ J.‎ Nielsen,‎ “Multiparty‎ Computation, an Introduction,”‎ 2009,‎

available from http://cs.au.dk/~jbn/smc.pdf.

[20] Y.‎Desmedt,‎ “Society‎ and‎group‎oriented‎ cryptography:‎ an‎new‎concept,”‎ in‎Advances in

Cryptography - CRYPTO '87, Springer-Verlag LNCS 293, pp. 120-127, 1987.

[21] Y.‎Desmedt‎and‎Y.‎Frankel,‎ “Threshold‎Crypto-Systems,”‎ in‎Advances in Cryptography -

CRYPTO '89, Springer-Verlag LNCS 435, pp. 307-315, 1989.

[22] Y.‎ Dodis,‎ L.‎ Reyzin,‎ A.‎ Smith,‎ “Fuzzy‎ extractors:‎ How‎ to‎ generate‎ strong‎ keys‎ from‎biometrics‎and‎other‎noisy‎data,”‎ in‎SIAM Journal on Computing, Vol. 38, No. 1, pp. 97-

139, 2008.

[23] M.C. Doganay, T.B. Pedersen, Y. Saygin, Erkay Savas, and Albert Levi,‎ “Distributed

Privacy Preserving k-Means Clustering with Additive Secret Sharing,” in International

Workshop on Privacy and Anonymity in the Information Society (PAIS), 2008.

[24] J. Dowd , S. Xu , W. Zhang,‎ “Privacy-preserving decision tree mining based on random

substitutions,”‎ in International Conference on Emerging Trends in Information and

Communication Security, 2005.

[25] S.‎ Dziembowski,‎ K.‎ Pietrzak,‎ “Leakage-Resilient‎ Cryptography,”‎ in‎FOCS '08, pp. 293-

302, 2008.

[26] R.‎Endsuleit,‎W.‎Geiselmann,‎R.‎Steinwandt,‎“Attacking‎a‎Polynomial-Based Cryptosystem:

Polly Cracker,” in International Journal of Information Security, Vol. 1, No. 3, pp. 143-148,

2002.

[27] M.‎Fellows‎and‎N.‎Koblitz,‎“Combinatorial‎Cryptosystems‎Galore!,” in Finite fields: theory,

applications, and algorithms, Vol. 168, pp. 51-61, 1994.

[28] W.‎Geiselmann,‎R.‎Steinwandt,‎“Cryptanalysis‎of‎Polly Cracker,” in IEEE Transactions on

Information Theory 48(11), pp. 2990-2991, 2002.

Page 188: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

176

[29] P.‎Gemmell,‎“An‎introduction‎to‎threshold‎cryptography,”‎in‎Cryptobytes, pp. 7-12, 1997.

[30] R.‎Gennaro,‎S.‎Jarecki,‎H.‎Krawczyk‎and‎T.‎Rabin,‎“Robust‎and‎efficient‎sharing of RSA

functions,”‎ in‎ Advances in Cryptology - CRYPTO '96, Springer-Verlag LNCS 1109, pp.

157-172, 1996.

[31] C.‎Gentry,‎“Fully‎homomorphic‎encryption‎using‎ideal‎lattices,” in 41st ACM Symposium on

Theory of Computing (STOC), 2009.

[32] C. Gentry,‎“A‎working‎ implementation of fully homomorphic encryption,” available from

http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf.

[33] GMU MP library, available from http://gmplib.org/.

[34] O. Goldreich, S. Micali, and A. Wigderson, “How to play ANY mental game,” in

Proceedings of the nineteenth annual ACM conference on Theory of computing, pp. 218-

229. ACM Press, 1987.

[35] O. Goldreich, “Foundations of Cryptography: Volume 1, Basic Tools,” Cambridge

University Press, ISBN-10: 0521035368, 2007.

[36] O. Goldreich, “Foundations of Cryptography: Volume 2, Basic Applications,” Cambridge

University Press, ISBN-10: 052111991X, 2009.

[37] S. Goldwasser, S. Micali, “Probabilistic Encryption,” in Special issue of Journal of

Computer and Systems Sciences, Vol. 28, No. 2, pp. 270-299, 1984.

[38] S.C.‎ Gultekin‎ Ozsoyoglu,‎ D.‎ Singer,‎ “Anti-tamper databases: Querying encrypted

databases,”‎in‎Conference on Database and Applications Security, 2003.

[39] H.‎Hacig¨um¨us,‎B.R.‎Iyer,‎C.‎Li,‎and‎S.‎Mehrotra,‎“Executing‎SQL‎over‎encrypted‎data‎in‎

the database-service-provider model,” in Proceedings of the ACM SIGMOD Conf. on

Management of Data, Madison,Wisconsin, 2002.

[40] H. Hacig um us, B.R. Iyer, and‎S.‎Mehrotra,‎“Efficient Execution of Aggregation Queries

over Encrypted Relational Databases,”‎ in‎ Database Systems for Advanced Applications,

Vol. 2973, pp. 633-650, 2004.

[41] M.‎ Halloush‎ and‎ M.‎ Sharif,‎ “Global‎ heuristic‎ search‎ on‎ encrypted‎ data‎ (GHSED),”‎ in‎

International Journal of Computer Science Issues (IJCSI), Vol. 1, pp. 13-17, 2009.

[42] D.‎Hofheinz,‎R.‎Steinwandt,‎ “A‎Differential‎Attack‎ on‎Polly‎Cracker,” in Proceedings of

IEEE International Symposium on Information Theory, pp. 211, 2002.

[43] Z. Huang, W. Du, and B. Chen, “Deriving private informaiton from randomized data,” in

ACM SIGMOD International Conference on Management of Data, pp. 37-47, 2005.

[44] M. Kane‎ and‎ D.‎ Kawamoto,‎ “Oracle‎ buys‎ PeopleSoft‎ for‎ $10‎ billion,” available from

http://news.cnet.com/Oracle-buys-PeopleSoft-for-10-billion/2100-1001\_3-5488298.html.

Page 189: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

177

[45] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, “On the privacy preserving properties

of random data perturbation techniques,” in IEEE International Conference on Data Mining,

2003.

[46] J.‎Katz,‎Y.‎ Lindell,‎ “Introduction‎ to‎Modern‎Cryptography:‎ Principles‎ and‎ Protocols,” in

Chapman & Hall/CRC, 2007.

[47] F. Levy-dit-Vehel, L. Perret, “A‎Polly‎Cracker‎System‎Based‎on‎Satisfiability,” in Progress

in Computer Science and Applied Logic, pp. 177-192, 2004.

[48] J.‎ Li,‎ E.R.‎Omiecinski,‎ “Efficiency‎ and‎ security‎ trade-off in supporting range queries on

encrypted‎databases,”‎in‎Data and Applications Security, pp. 69-83, 2005.

[49] M. Luby and C. Racko, “How to construct pseudo-random permutations from pseudo-

random functions,” in SIAM Journal of Computing, Vol. 17, No. 2, pp. 373-386, 1988.

[50] L.‎ Ly,‎ “Polly‎ Two‎ −‎ A‎ New‎ Algebraic‎ Polynomial-Based Public-Key Scheme,” in

Applicable Algebra in Engineering, Communication and Computing, Vol. 17, pp. 267-283,

2006.

[51] M.M. Mano, “Digital Design,” Prentice Hall; 3 edition, August 1, 2001.

[52] M.‎Naor,‎B.‎Pinkas,‎O.‎Reingold,‎ “Distributed‎Pseudo-Random‎Functions‎ and‎KDCs,”‎ in

Advances in Cryptology EUROCRYPT'99, pp. 327-346, 1999.

[53] G.‎ Ozsoyoglu,‎ D.‎ Singer,‎ S.S.‎ Chung,‎ “Anti-tamper databases: Querying encrypted

databases,” in Proceedings of the 17th Annual IFIP WG 11.3 Working Conference on

Database and Applications Security, Estes Park, Colorado, 2003.

[54] K.‎ Pagiamtzis‎ and‎A.‎ Sheikholeslami,‎ “Content-addressable memory (CAM) circuits and

architectures: A tutorial and survey,” in IEEE Journal of Solid-State Circuits, Vol. 41, No.

3, pp. 712–727, 2006.

[55] P. Paillier‎“Public-Key Cryptosystems Based on Composite Degree Residuosity Classes,” in

EUROCRYPT’99, pp. 223-238, 1999.

[56] T.‎ Pederson,‎ “A‎ threshold‎ crypto-system‎ without‎ a‎ trusted‎ dealer,”‎ in‎ Advances in

Cryptology - EUROCRYPT '91, Springer-Verlag LNCS 547, pp. 522-526, 1991.

[57] PlanetLab, available from http://www.planet-lab.org.

[58] P.K.‎ Prasad,‎ C.P.‎ Rangan,‎ “Privacy‎ Preserving‎ BIRCH‎ Algorithm‎ for‎ Clustering‎ over‎Arbitrarily‎Partitioned‎Databases,”‎ADMA, pp. 146-157, 2007.

[59] T.‎Rai,‎ “Infinite‎Gröbner bases and Noncommutative Polly Cracker Cryptosystems,”‎PhD

Thesis, Virginia Polytechnique Institute and State Univ, 2004.

[60] R.L. Rivest,‎ A.‎ Shamir,‎ L.‎ Adleman,‎ “A‎ Method‎ for‎ Obtaining‎ Digital‎ Signatures‎ and‎

Public-Key Cryptosystems,” in Communications of the ACM 21 (2), pp. 120–126, 1978.

Page 190: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

178

[61] R.L.‎ Rivest,‎ L.‎ Adleman‎ and‎ M.L.‎ Dertouzos,‎ “On‎ data‎ banks‎ and‎ privacy‎

homomorphisms,” in Foundations of Secure Computation, eds. R. A. Demillo et al.,

Academic Press, pp. 167-179, 1978.

[62] T.‎Sander,‎A.‎Young,‎M.‎Yung,‎“Non-Interactive CryptoComputing For NC1,” in FOCS'99,

pp. 554-567, 1999.

[63] A.‎Shamir,‎“How‎to‎share‎a‎secret,” in Communications of the ACM, Vol. 22, Issue 1, pp.

612-613, 1979.

[64] E. Shi, J. Bethencourt, T-H.‎H.‎Chan,‎D.‎ Song,‎ and‎A.‎ Perrig,‎ “Multi-dimensional range

query‎over‎encrypted‎data,”‎in‎Symposium on Security and Privacy, pp. 350-364, 2007.

[65] N.‎P.‎Smart‎and‎F.‎Vercauteren,‎“Fully‎homomorphic‎encryption‎with‎relatively‎small‎key‎and ciphertext sizes,” in PKC’10, pp. 420-443, 2010.

[66] D.X.‎Song,‎D.‎Wagner,‎A.‎Perrig,‎“Practical‎techniques‎for‎searches‎on‎encrypted‎data,”‎in‎

IEEE Symposium on Security and Privacy, pp. 44-55, 2000.

[67] D.‎Stehle‎and‎R.‎Steinfeld,‎“Faster‎Fully‎Homomorphic‎Encryption,” in Cryptology ePrint

Archive: Report 2010/299, 2010.

[68] R.‎Steinwandt,‎“A‎Ciphertext-Only Attack on Polly Two,”‎preprint,‎2006.

[69] B.M. Thuraisingham, “Multilevel Secure Database Management System,” in Encyclopedia

of Database Systems, pp. 1789-1792, 2009.

[70] J. Vaidya, C. Clifton, “Privacy-preserving k-means clustering over vertically partitioned

data,”‎ in‎Proceedings of the ninth ACM SIGKDD international conference on Knowledge

discovery and data mining, pp. 206- 215, 2003.

[71] M.‎van‎Dijk,‎C.‎Gentry,‎S.‎Halevi,‎and‎V.‎Vaikuntanathan,‎“Fully‎homomorphic‎encryption

over the integers,” in EUROCRYPT'10 Proceedings of the 29th Annual international

conference on Theory and Applications of Cryptographic Techniques, pp. 24-43, 2010.

[72] L. Xiao, I. Yen, “A Note for the Ideal Order-Preserving Encryption Object and Generalized

Order-Preserving‎Encryption,”‎http://eprint.iacr.org/2012/350.pdf.

[73] L.‎ Xiao,‎ I.‎ Yen,‎ “Security‎ Analysis‎ and‎ Enhancement‎ for‎ Prefix-Preserving Encryption

Schemes,”‎submitted‎to‎Asiacrypto’12, http://eprint.iacr.org/2012/191.pdf.

[74] L.‎ Xiao,‎ I.‎ Yen,‎ D.T.‎ Huynh,‎ “Extending‎ Order‎ Preserving‎ Encryption‎ for‎ Multi-User

Systems,”‎submitted‎to‎Infocom’13, http://eprint.iacr.org/2012/192.pdf.

[75] L.‎Xiao,‎ I.‎Yen,‎D.‎Lin,‎“Security‎Analysis‎ for‎ an‎Order‎Preserving‎Encryption‎Scheme,”‎Tech Report UTDCS-06-10, 2010, available from http://utdallas.edu/~xll052000/OPEproof-

TR1.pdf, revised version: http://utdallas.edu/~xll052000/OPEproof-TR2.pdf.

[76] L.‎Xiao,‎O.‎ Bastani,‎ I.‎ Yen,‎ “An‎Efficient‎Homomorphic‎ Encryption‎ Protocol‎ for‎Multi-

User‎Systems,”‎submitted‎to‎ICDE’13, http://eprint.iacr.org/2012/193.pdf.

Page 191: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

179

[77] L.‎Xiao,‎O.‎Bastani,‎I.‎Yen,‎“Security‎Analysis‎for‎Order‎Preserving‎Encryption‎Schemes,” in CISS, 2012.

[78] J.‎Xu,‎J.‎Fan,‎M.H.‎Ammar,‎and‎S.B.‎Moon,‎“Prefix-preserving IP address anonymization:

Measurement-based security evaluation and a new cryptography-based‎ scheme,”‎ in‎ IEEE

International Conference on Network Protocols, pp. 280-289, 2002.

[79] L.‎Xu,‎“Hydra:‎a‎platform‎for‎survivable‎and‎secure‎data‎storage‎systems,” in Proceedings

of the 2005 ACM workshop on Storage security and survivability, pp. 108-144, 2005.

[80] A.C. Yao, “Protocols for Secure Computations (Extended Abstract),” FOCS, pp. 160-164,

1982.

Page 192: SECURE QUERY COMMUNICATION AND PROCESSING …csi.utdallas.edu/Paper_Links/Dissertation-Xiao.pdf · 2015-10-15 · The security problems with the outsourced databases can be solved

180

VITA