protocols for secure remote database access with approximate...

Protocolsfor SecureRemoteDatabaseAccesswithApproximateMatching

�

WenliangDuCERIASand

Departmentof ComputerSciencesPurdueUniversity

WestLafayette,IN 47907Email: [email protected]

Tel: (765)496-6765

Mikhail J. AtallahCERIASand



Tel: (765)494-6017

Florian KerschbaumCERIASand



Tel: (765)496-6766

Abstract

Supposethat Bob hasa database�

and that Alice wantsto performa search query � on�

(e.g.,“is � in

�?”). SinceAlice is concernedabouther privacy, shedoesnot want Bob to knowthe query� or the responseto the query. How could this be done? There are elegant cryptographic techniques

for solvingthis problemundervariousconstraints(such as“Bob shouldknowneither � nor theanswerto the query” and “Alice shouldlearn nothingabout

�other than the answerto the query”), while

optimizingvariousperformancecriteria (e.g., amountof communication).Weconsidertheversionof thisproblemwherethequeryis of thetype“is � approximatelyin

�?” for

a numberof differentnotionsof “approximate”, someof which arise in image processingandtemplatematching, while others arise in biological sequencecomparisons.New techniquesare neededin thisframeworkof approximatesearching, becauseeach notionof “approximateequality” introducesits ownsetof difficulties; usingencryptionis more problematicin this framework becausethe itemsthat areapproximatelyequalceaseto besoafter encryptionor cryptographichashing. Practical protocolsforsolvingsuch problemsmake possiblenew formsof e-commerce betweenproprietary databaseownersandcustomerswhoseekto querythedatabase, with privacy.

We first presentfour secure remotedatabaseaccessmodelsthat could be usedin an e-commerceframework,each of which hasdifferentprivacyrequirement.Wethenpresentour solutionsfor achievingprivacyin each of thesefour models.

Keywords: Privacy, security, securemulti-partycomputation,patternmatching,approximatepatternmatch-ing

�Portionsof this work weresupportedby GrantEIA-9903545from theNationalScienceFoundation,andby sponsorsof the

Centerfor EducationandResearchin InformationAssuranceandSecurity.

1

1 Intr oduction

Considerthefollowing real-lifescenario:Alice thinksthatshemayhavesomegeneticdisease,soshewantsto investigateit further. Shealsoknows that Bob hasa databasecontainingknown DNA patternsaboutvariousdiseases.After Alice getsa sampleof herDNA sequence,shesendsit to Bob, who will thentellAlice thediagnosis.However, if Alice is concernedaboutherprivacy, theabove processis not acceptablebecauseit doesnotpreventBob from knowing Alice’s privateinformation–boththequeryandtheresult.

This kind of situation,which is likely to arisease-commercedevelops,motivatesthefollowing generalproblemformulation:

SecureDatabaseAccess(SDA) Problem:Alicehasa string � , andBobhasa databaseof strings�� ; Alice wantsto knowwhetherthere existsa string

��in Bob’s databasethat

“matches” � . The“match” could bean exactmatch or an approximate(closest)match. Theproblemis howto designa protocolthataccomplishesthistaskwithoutrevealingto BobAlice’ssecret query � or theresponseto that query.

Becauseof its practicalimportanceandalsobecausenot muchwork hasbeendonefor approximatepatternmatchingin theSDA context, ourwork particularlyfocuseson approximatepatternmatching.

Theexactmatchingproblemhasbeenextensively consideredin theliterature[20, 5, 17, 16, 21, 23, 22,12], eventhoughit cantheoreticallybesolvedusingthegeneraltechniquesof securemulti-partycomputa-tion [9]. Themotivationfor giving thesespecializedsolutionsto it is thatthey aremoreefficient thanthosethatfollow from theabove-mentionedgeneraltechniques.This is alsoourmotivationin consideringapprox-imatepatternmatchingeven thoughit too is a specialcaseof thegeneralsecuremulti-party computationproblem.Unlike exactpatternmatchingthatproduces“yes” and“no” answers,approximatepatternmatch-ing measuresthedifferencebetweenthetwo targets,andproducesascore to indicatehow differentthetwotargetsare.Themetricsusedto measurethedifferenceusuallyareheuristicandareapplication-dependent.For example,in imagetemplatematching[13, 18], ��! �#"%$ �'&)( and ��+* �,"%$ � * areoftenusedto mea-surethedifferencebetweentwo sequences

and

$. In DNA sequencematching[14], edit distance[1, 6]

makesmoresensethantheabove measurements;edit distancemeasuresthecostof transformingonegivensequenceto anothergivensequence,andits specialcase,longestcommonsubsequenceis usedto measurehow similar two sequencesare.

Solving approximatepatternmatchingproblemswithin theSDA framework is quite a nontrivial task.Considerthe �� * �-".$/� * metric as an example. The known PIR (private information retrieval) tech-niques[20, 5, 17, 16, 21, 23, 22, 12] canbeusedby Alice to efficiently accesseachindividual

$/�without

revealingto Bob anything aboutwhich$ �

(or evenwhich$) Alice accessed(moreon this later),but doing

this for eachindividual$ �

andthencalculating �0�� * � "1$ � * violatesthe requirementthat Alice shouldknow the total score � ��* ��"2$/� * withoutknowinganythingother than that score, i.e., without learninganything abouttheindividual

$/�values.Usinga generalsecuremulti-partycomputationprotocoltypically

doesnot leadto anefficient solution.Thegoalsof our research,andtheresultspresentedin this paper, arefindingefficient waysto dosuchapproximatepatternmatchingswithoutdisclosingprivateinformation.

The practicalmotivationsof remotedatabaseaccessdo not all point to the modelwe describedin theabove SDA formulation. For example,in somesituations,Bob’s databasecouldbeproprietarywhereasinsomeothersit couldbepublic(in eithercasetheprotocolshouldrevealnothingto BobaboutAlice’squery).The“proprietary” natureof a databasemight make thesolutionmoredifficult becauseAlice shouldnot beableto know moreinformationthanthe responseto herquery. Thereis alsoanotherpracticalframework,within which Alice usesBob to storea (suitably disguised)versionof her private database(a form ofoutsourcing),and for sucha framework the solutionscould be quite different. Basedon thesevariantsof the problem,we have investigatedfour SDA models,and defineda classof SDA problemsfor each

2

modelaccordingto themetricswe usefor approximatepatternmatching.Of coursethedifficultiesof theproblemsarenot thesamefor thedifferentmetrics,andsofar we have solved a subsetof thoseproblems.A summaryof our resultsis listedbelow (theresultsarestatedmorepreciselyin Section4, andthemodelsaredefinedin Section3 – in themeantimeseeFigure1 in thatsectionfor asummaryof eachmodel).

3 For thePrivateInformationMatchingModel,wehaveasolutionto theapproximatepatternmatchingbasedonthe � �� ! �4"5$/�'&)( metricwith 6 �87:9<; & communicationcost,where

7is thelengthof each

stringand;

is thenumberof stringsin thedatabase.

3 For the Private Information Matching Model, we also have a solution to the approximatepatternmatchingbasedon the �0�� * �=">$ � * metric usinga Monte Carlo technique;the solutiongivesanestimatedresult,andit has 6 �87?9A@B9C; &

communicationcost,where@

is a parameterthataffectstheaccuracy of theestimate.

3 For thePrivateInformationMatchingModel, if weassumethatthealphabetis known to theinvolvedpartiesand its size is finite, we have a solution to approximatepatternmatchingbasedon general�D��FE �! � �/$ � & metrics,hencethesolutionsfor thespecialcasesof �0�� * � "2$ � * , �0�� ! � "2$ � & ( ,and � ��G��IHJ�! �)�/$ �K& (where

HJ�8L ��MN&is 1 if

L �1Mand0 otherwise).Thesesolutionshave 6 �8OP9+7Q9R; &

communicationcost,whereO

is the numberof the symbolsin the alphabet. In many cases,O

issmall.For instance,

Ois four in DNA databases.

3 For theSecureStorageOutsourcingModel,wehaveapracticalsolutionto approximatepatternmatch-ing basedon the � ��! ��"S$/�T&)( metric. Thesolutionis practicalbecauseits 6 �87 & communicationcostdoesnotdependon

;.

3 For the SecureStorageOutsourcingandComputationModel, we alsohave a practicalsolution toapproximatepatternmatchingbasedon the � ��! �U"V$ �T&)( metric.This solutionis practicalbecauseof its communicationcostis 6 �87 ( & .

Moti vation

Why do we careaboutthe privacy of a databasequery? In the exampleusedearlier in this section,if amatchis found in the database,Bob immediatelyknows that Alice hassucha disease;even worse,afterreceiving Alice’s DNA sequence,Bob canderive muchelseaboutAlice, suchasotherhealthproblemsthatAlice mighthave. If Bob is not trustworthy, BobcoulddisclosetheinformationaboutAlice to otherparties,andAlice mighthavedifficulty gettingemployment,insurance,credit,etc.But evenif Alice trustsBob,andBob hasno intentionof disclosingAlice’s privateinformation,Bob himselfmight preferthatAlice’s querybe kept privateout of liability concerns:If Bob knows Alice’s DNA information,andthat informationisaccidentallydisclosed(perhapsby a disgruntledemployeeof Bob’s,or aftera systembreak-in),Bob mightfaceanexpensive lawsuit from Alice. Fromthis perspective, a trustedBob will actuallyprefernot to knoweitherAlice’s queryor its response.

With thegrowth of theInternet,moreandmoree-commercetransactionslike theabove will take place.TherearealreadyDNA patterndatabases,publicdatabasesaboutdiseases,patentdatabases,andin thefuturewe mayseemany morecommercialdatabasesandtherelateddatabaseaccessservices,suchasfingerprintdatabases,signaturedatabases,medicalrecorddatabases,andmany more. Privacy will be a major issue,andassumingthetrustworthinessof theserviceproviders,asis donetoday, is risky; thereforeprotocolsthatcansupportremoteaccessoperationswhile protectingtheclient’s privacy areof growing importance.

Oneof thefundamentaloperationsbehindthequeriesdescribedin theexamplesabove is patternmatch-ing. Therefore,thebasicproblemthatwe faceis how to conductpatternmatchingoperationsat theserver

3

sidewhile theserver hasno knowledgeof theclient’s actualquery(or theresponseto it). In somedatabaseaccesssituations,exactpatternmatchingis used,suchasqueryby name,queryby socialsecuritynumber,etc. However, in many othersituations,exact patternmatchingis unrealistic. For instance,in fingerprintmatching,even if two fingerprintscomefrom thesamefinger, they areunlikely to beexactly thesamebe-causethereis someinformationlossin theprocessof deriving anelectronicform (usuallya complex datastructureof features)from a raw fingerprintimage. Similarly in voice, face,andDNA matching;in theseandmany othersituations,exactmatchingis not expectedandsomeform of approximatepatternmatchingis moreuseful.

Background Inf ormation on Secure Multi-party Computation

Theabove problemis aspecialcaseof thegeneralsecuremulti-partycomputationproblem[34]. Generallyspeaking,amulti-partycomputationproblemdealswith computingany probabilisticfunctiononany input,in adistributednetwork whereeachparticipantholdsoneof theinputs,ensuringindependenceof theinputs,correctnessof thecomputation,andthatnomoreinformationis revealedto aparticipantin thecomputationthancanbecomputedfrom thatparticipant’s input andoutput[11]. Otherexamplesof suchcomputationsinclude: electionsover the Internet,electronicbidding, joint signatures,andjoint decryption.Thehistoryof themulti-party computationproblemis extensive sinceit wasintroducedby Yao [34] andextendedbyGoldreich,Micali, andWigderson[24], andby many others: Goldwasser[11] predictsthat “the field ofmulti-partycomputationsis todaywherepublic-key cryptographywastenyearsago,namelyanextremelypowerful tool andrich theorywhosereal-life usageis at this time only beginning but will becomein thefutureanintegral partof our computingreality”.

Goldreichstatesin [9] that the generalsecuremulti-party computationproblemis solvable in theory.However, he alsopointsout that usingthe solutionsderived by thesegeneralresultsfor specialcasesofmulti-party computation,canbe impractical; specialsolutionsshouldbe developedfor specialcasesforefficiency reasons.

Oneof the well-known specialcasesof multi-party computationis the Private InformationRetrieval(PIR) problem:Theproblemconsistsof a client andserver. Theclient needsto get the W th bit of a binarysequencefrom theserver without letting theserver know the W ; theserver doesnot want theclient to knowthe binary sequenceeither. A solution for this problemis not difficult; however an efficient solution, inparticularasolutionwith smallcommunicationcost,is noteasy. Studies[20, 5, 17,16, 21, 23, 22, 12] haveshown thatonecandesignaprotocolto solve thePIRproblemwith muchbettercommunicationcomplexitythanby usingthegeneraltheoreticalsolutions.Patternmatchingis anothersuchspecificcomputation,andtherecentprogressin thePIR problemmotivatedusto speculatethat thereexist efficient solutionsfor thisparticularkind of securemulti-partycomputationaswell.

Secure Multi-party Protocol vs. AnonymousCommunication Protocol

Anonymouscommunicationprotocols[30, 10] weredesignedto achieve somewhat relatedgoals,so whynot usethem? Anonymity techniqueshelp to hide the identity of the informationsender, ratherthantheinformationbeingsent.For example,whenpeoplebrowsetheweb,they canuseanonymouscommunicationtechniquesto keeptheir identitiessecret,but thewebqueryusuallyis not secretbecausethewebserver hasto know thequeryin orderto senda reply back. In situationswherethe identity of the informationsenderneedsto beprotected,anonymouscommunicationprotocolsareappropriate.However, therearesituationswhereanonymouscommunicationprotocolscannotreplacesecuremulti-partycomputationprotocols.First,certaintypesof informationintrinsicallyrevealtheidentityof someonerelatedto theinformation(e.g.,socialsecuritynumber). Secondly, in somesituations,it is the informationitself that needsto be protected,notthe identity of the informationsender. For instance,if Alice hasan invention,shehasto searchif suchan

4

inventionis new beforeshefiles for apatent.Whenconductingthequery, Alice maywantto keepthequeryprivate(perhapsto avoid partof her ideabeingstolenby peoplewho have accessto her query);shedoesnotcarewhetherheridentity is revealed.Thirdly, in certainsituations,onehasto bearegisteredmemberinorderto usethedatabaseaccessservice;this makeshiding a user’s identity difficult becausetheuserhastoregisterandlogin first, whichmight alreadydiscloseheridentity.

Furthermore,most of the known practicalanonymousprotocols,suchas Crowds [30], Onion rout-ing [10] andanonymizer.com, useoneor severaltrustedthird parties.In oursecuremulti-partycompu-tationprotocols,wedonotuseatrustedthird party;whenathird partyis used,wegenerallyassumethatthethird partyis not trusted,andshouldlearnnothingabouteitherAlice’s query, or Bob’s data,or theresponseto thequery.

Thereforeanonymity doesnot totally solve our problems,andcannotreplacesecuremulti-party com-putation.Rather, by combininganonymity techniqueswith securemulti-partycomputationtechniques,onecanachieve betteroverall privacy moreefficiently.

2 RelatedWork

As Goldwasserpointsout in [11], in the1980’s the focusof researchwasto show themostgeneralresultpossible,yielding multi-party protocolsolutionsfor any probabilisticfunction. Much of the currentworkis to focuson efficientandnon-interactivesolutionsto specialimportantproblemssuchasjoint-signatures,joint-decryption,andsecureandprivatedatabaseaccess.

Amongvariousmulti-partycomputationproblems,thePrivateInformationRetrieval (PIR)problemhasbeenwidely studied;it is alsotheproblemmostrelatedto whatwe presentin this paper(althoughhereweusenoneof theeleganttechniquesfor PIR thatarefoundin the literature,for reasonswe explainedearlierin thispaper).ThePIRproblemconsistsof devisinga protocolinvolving auseranda databaseserver, eachhaving asecretinput. Thedatabase’s secretinput is calledthedatastring, an

;-bit string X �Y$��/$ ( �� $Z� .

Theuser’s secretinput is aninteger W between1 and7

. Theprotocolshouldenabletheuserto learn$/�

in acommunication-efficientwayandatthesametimehide W from thedatabase.Thetrivial solutionis having thedatabasesendanencryptionof theentirestring X to theuser, with an 6 �87:9U; & communicationcomplexity.Much work hasbeendonefor reducingthiscommunicationcomplexity [20, 5, 17, 16,21, 23, 22, 12].

Choret al. point out thata majordrawbackof all known PIR schemesis theassumptionthat theuserknows thephysicaladdressof thesoughtitem [8], whereasin thecurrentdatabasequeryscenariotheusertypically holdsa keyword and the databaseinternally converts this keyword into a physicaladdress.Tosolve this problem,Choret al. proposea schemeto privatelyaccessdataby keywords[8]. Thedifferencebetweentheproblemstudiedin Chor’spaperandtheproblemsin ourpaperis thatweextendtheproblemtocover approximatepatternmatching.

Songet al. proposea schemeto conductsearcheson encrypteddata[33]. In that framework, Alicehasa database,andshehasto storethedatabasein a server controlledby Bob; how couldAlice queryherdatabasewithout letting Bob know thecontentsof thedatabaseor thequery?Herewe primarily focusonextendingtheproblemto alsocover approximatepatternmatching.

3 Framework

3.1 Models

Remotedatabaseaccesshasmany variants.In somee-commercemodels,Bob’sdatabaseis privatewhile insomeothermodels,it is public. In the lattercase,thereis no requirementto keepthedatabasesecretfromAlice; however, theprivacy of Alice’s querystill needsto bepreserved. In othere-commercemodels,Bob

5

hostsAlice’s (encrypted/disguised) databasewhile supportingqueriesfrom Alice andothercustomers,inwhichcaseBobshouldknow neitherthedatabasenor thequeries.

Private DatabaseBob’s

BobAlicequery

reply

Public Database

Alice Bobquery

reply

(c) SSO Model

Private DatabaseAlice’s

BobAlicequery

reply

(d) SSCO Model

Private DatabaseAlice’s

BobAlice

Carl

outsourcing

query

replypay for

the service

(a) PIM Model (b) PIMPD Model

Figure1: Models

Fromthevariouswaysthatremotedatabaseaccessisconducted,wedistinguishfourdifferente-commercemodels,all of which requirecustomers’privacy:

3 PIM: PrivateInformationMatchingModel (Figure1.a)

3 PIMPD: PrivateInformationMatchingfrom PublicDatabaseModel (Figure1.b).

3 SSO:SecureStorageOutsourcingModel (Figure1.c).

3 SSCO:SecureStorageandComputingOutsourcingModel (Figure1.d).

For the sake of convenience,we will use [ ]\Z^ � & to representthe patternmatchingfunction, whichincludesbothexactpatternmatchingandapproximatepatternmatching.

Private Inf ormation Matching Problem (PIM)

Alice hasa stringL, andBobhasadatabaseof strings

�_�� ; Alice wantsto know theresultof

[ ]\�^ �8L �]�`& . Becauseof theprivacy concern,Alice doesnotwantBobto know thequeryL

or theresponseto thequery;BobdoesnotwantAlice to know any stringin thedatabaseexceptfor whatcanbederivedfromthereply. Furthermore,Bobwantsto makemoney from providing suchaservice,thereforeAlice shouldnotbeableto conductthequeryingby herself;in otherwords,every time Alice wantsto performsuchaquery,shehasto contactBob,otherwiseshecannotgetthecorrectanswer.

Private Inf ormation Matching from Public DatabaseProblem(PIMPD)

Bob hasa databaseof strings��a��/��b�

, whosecontentsarepublic knowledge.Alice hasa queryL, andshewantsto know the resultof [ ]\�^ �8L �]��& without disclosingto Bob eitherher query

Lor the

responseto it.Thisproblemis differentfrom thePIM problem:In thePIM problem,Bobdoesnotallow Alice to know

any informationaboutthedatabaseexceptfor whatcanbederivedfrom thereply. In thePIMPD problem,sincethedatabasecontainsonly publicknowledge,thereis noneedto preventBob from lettingAlice know

6

moreaboutthecontentsof thedatabasethanthestrict answerto herquery(althoughBob’s doingsomayresultin unnecessarycommunication).

Secure StorageOutsourcing Problem (SSO)

Alice hasa databaseof strings�c�B��/��]�`�

, but shedoesnot have enoughstoragefor the largedatabase,so sheoutsourcesher database(suitably disguised–moreon this later) to Bob, who providesenoughstoragefor Alice. Furthermore,from time to time, Alice needsto queryherdatabaseandretrievesthe informationthatmatchesherquery, i.e., Alice wantsto know [ ]\Z^ �8L �]��& for herquery

L. As usual,

Alice wantsto keepthecontentsof boththedatabaseandthequerysecretfrom Bob.

Secure Storageand Computing Outsourcing Problem(SSCO)

TheSSCOproblemis anextensionof theSSOproblem.Whereasonly Alice queriesherdatabasein theSSOproblem,in theSSCOmodelthedatabasewill alsobequeriedby otherclientsof Alice. Morespecifically, intheSSCOmodel,Alice outsourcesherdatabaseto Bob,andshewantsthedatabaseto beavailableto anyonewho is willing to payherfor thedatabaseaccessservice.Whenaclientaccessesthedatabase,neitherAlicenor Bob shouldknow thecontentsof thequery. Moreover, Alice wantsto chargetheclientsfor eachquerythey have submitted,so theclient shouldnot beableto get thecorrectqueryresultif Alice is not awareofthequery’s existence.

SinceBob canpretendto bea client, thesolutionsof theSSCOproblemshouldbesecureeven if BobcancolludeagainstAlice with any client. However, theSSOproblemdoesnothavesuchaconcernbecausetheonly client is Alice herself.

3.2 Notation

Foreachmodel,thereis afamily of problems.Wewill usethefollowing notationsto representseachspecificproblem:

3 [ /Exact:ExactPatternMatchingproblemin model [ .

3 [ /Approx: ApproximatePatternMatchingproblemin model [ .

– [ /Approx/E

: use � � d ��FE �! d �/$ d & metricto measurethedistancebetweentwo strings,whereE

is ageneralfunction.

– [ /Approx/H: use � � d �� HJ�! d �/$ d & metric to measurethedistancebetweentwo strings,where

His theKronecker symbol:

HJ�8L ��MN&= 0 if andonly if

L �1Mand1 otherwise.

– [ /Approx/Abs:use � � d �� * d "V$ d * metricto measurethedistancebetweentwo strings.

– [ /Approx/Squ:use � � d ��! d "V$ d &)( metricto measurethedistancebetweentwo strings.

– [ /Approx/Edit:9 [ /Approx/Edit/String:usethestringeditingcriterion[6] to measurethedistancebetweentwo strings.9 [ /Approx/Edit/Tree: usethe treeediting criterion to measurethe distancebetweentwotrees.

The [ /Exactproblemhasbeenstudiedextensively in certainmodel,suchasPIM andSSO,but the[ /Approx problemhasnot. Our resultsdealmostlywith the [ /Approx problem.

7

4 ProtocolsUsingUntrusted Third Party

This sectionpresentsprotocolsthat usean untrustedthird party. The next section(Section5) shows howto getrid of Ursulaandgetstrictly 2-partysolutions,but at theconsiderablecostof repeatedlyusingYao’smillionaire protocol[34], andalsothe(somewhat)expensive oblivioustransfertechnique[29, 26].

4.1 Preliminary

Protocol for scalar (and other) products

Recallthatthescalarproductof two vectors eL � �8L ��G�G�G� L �&

and eMf� � Mg��G�G�G��M �&

is:eLih eM:� � � d �� L d 9 M d .We describea protocolfor Alice andBob to computethescalarproductof Alice’s vector eL andBob’s

vector eM usinganuntrustedthird partyUrsula.NeitherAlice norBob shouldlearnanything abouttheotherparty’s input (otherthanwhatcanbederived from knowing eL?h eM ), andUrsulashouldlearnnothingabouteither eL or eM . This protocolwill later serve asa building block for otherprotocols. Essentiallythe sameprotocolalsosolvestheasymmetricversionof theproblem,in whichonly Alice is to know eLih eM .

1. Alice andBob jointly generatetwo randomnumbersj and j#k .2. Alice andBob jointly generatetwo randomvectors el , el k (of size

7).

3. Alice sends em � � eLfn el and o � � eLQh el k n j to Ursula.

4. Bob sends em ( � eM n el k and o ( � el hp� eM n el k & n jqk to Ursula.

5. Ursulacomputesr � em � h em ( " o �s" o ( andgets r � eL?h eMQ" � j n j k & ; shethensendthe resulttoAlice andBob. Note. In theasymmetricversionof theproblem(in whichonly Alice is to know eLth eM )Ursuladoesnot sendanything to Bob in thisstep.

6. Alice and(in thesymmetricversionof theproblem)Bob thenget eLQh eMf� r n.� j n j k & .Notethatthecommunicationcomplexity of theabove is linearin thesizeof theinputs,i.e., it is 6 �87 & .Althoughonly thescalarproductcaseis neededlaterin thispaper, it shouldbeclearthatotheroperations

thanscalarproductcanbecarriedoutusingsuitablymodifiedversionsof theaboveprotocol.Theseincludematrix product,in whichcaseAlice andBob have matricesand

l � l k � j � j#k arerandommatrices.They alsoincludethe convolution productof two vectors,in which case

l � l k � j � j k arerandomvectors.This couldpotentiallybeusefulin othercontexts.

A lesspracticalprotocolthatdoesnot requireusingUrsulawill begivenin Section5.

4.2 PIM/A pprox

Exceptfor the researchon the generalsecuremulti-party computationproblem,this specificproblemhasnotbeenstudiedin theliterature.Unlessotherwisespecified,we assumethealphabetusedin thefollowingsolutionto be predefinedandits sizeto befinite. This assumptionis quite reasonablein many situations;for instance,DNA sequencesuseafixedalphabetof four symbols.Underthisassumption,wecansolve thePIM/Approx/

Eproblem.However, becausetheway to calculateedit distancecannotberepresentedin the

form � � d �� E �! d �/$ d & , thePIM/Approx/Editproblemis notaspecialcaseof thePIM/Approx/E

problem.In someothersituations,theabove finite alphabetassumptiondoesnot apply. For instance,fingerprint,

imageandvoicepatternsuserealnumbersinsteadof charactersfrom a known finite alphabet.Theabove-mentionedsolution for the PIM/Approx/

Eproblemcannotbe usedanymore, however by exploiting the

8

mathematicalpropertyof � ��! �u"v$ �'&)( , wehavecomeupwith asolutionfor thePIM/Approx/Squproblemfor infinite alphabetafter introducinganuntrustedthird partywho doesnot know theinputsfrom eitherofthe two partiesand learnsnothingaboutthem(or aboutthe query, or the answerto it). We alsohave asolutionto thePIM/Approx/AbsproblemusingaMonteCarlotechnique.All of thesearegivenbelow.

4.2.1 PIM/A pprox/SquProtocol

SupposethatBobhasadatabase�w�x�� G�G�G��`�

, andassumethelengthof eachstring)�

is7

; Alice wantsto know the

�Ay �thatmostcloselymatchesa query

L � L � �G�G� L � basedon thePIM/Approx/Squmetric.The requirementis that Bob shouldnot know

Lor the result,andAlice shouldnot be ableto learnmore

informationthanthereply from Bob.We proposea protocol to computethe matchingscoreusing an untrustedthird party, Ursula. Our

assumptionhereis thatUrsulawill not conspirewith eitherAlice or Bob. However, the third party is notfully trusted:Ursulashouldnotbeableto deduceeither

Lor�

, or thefinal matchingscoreo . Thisprotocolworksfor bothfinite andinfinite alphabet.

Let eL � � "{z L � ��G�G�G��"{z L ��|�&

; for each � �1M �8}~� �G�G�~M �8}

� , let e� � � � M �8}~� ��G�G�G��M �8} �� 0�d �� M (�8} d & . Observe that:

��d ��8L d "DM �8} d & ( � eLih e� � n ��

d ��L (d �

Since �D�d ��L (d is a constant,we canuse eL?h e� � insteadof �D�d ��8L d "�M �8} d &)( to find theclosestmatch.After wegettheclosestmatch,Alice cancalculatetheactualscoreby adding �D�d ��L (d .Protocol

1. Alice andBob jointly generatetwo randomnumbersj and j k .2. For each

�� y �, repeatthenext fivesub-steps,in which

)��1M,�8}~�Z�G�G�~M,�8}� , eL

� � "{z L ��G�G�G��"{z L ��|�&

.

(a) Bob constructse� �� M,�8}~��G�G�G��M,��} �� 0�d �� M (�8} d & .

(b) Alice andBob jointly generatetwo randomvectors el , el k (of size7in |

).

(c) Alice sends em � � eLfn el and o � � eLQh el k n j to Ursula.

(d) Bob sends em ( � e� � n el k and o ( � el hp� e� � n el k & n j k to Ursula.

(e) Ursulacomputesr �<� em � h em ( " o �=" o ( andgetstheresultingr �<� eLih e� ��" � j n j k & .3. Ursulacomputeso \Z� j�� k �w�f�� r � , andsendstheresulting o \Z� j�� k to Alice.

4. Alice computeso \Z� j�� o \Z� j�� k n � � d �� L (d n2� j n j k & , which is theclosestmatchbetweenL

andany)� y �.

Therandomvectors el and el k areusedto disguiseAlice’s andBob’sdata;therandomnumbersj and jqkareusedto disguisethequeryresultsandtheintermediateresults.Thecommunicationcostis 6 �87�9s; & .4.2.2 PIM/A pprox/AbsProtocol

First, we will presenta Monte Carlo techniquefor Alice and Bob to calculate* L d ".M d * (

L d is Alice’ssecretinput and

M d is Bob’s), andthenuseit asa building block to compute�0�d ��* L d "VM d * . Theprotocolinvolvesanuntrustedthird party, Ursula,who learnsnothing.Theprotocolworksfor bothfinite andinfinite

9

alphabets.Assumethat �f� L d��Y� and �f� M d:�_� for somenumber� . Theprotocolfor* L d "�M d * is as

follows (where@

is aparameterthataffectstheaccuracy of theestimate,and\Z�q� 7 ��j � � initially):

1. Alice generatesa randomnumberl d , andthengeneratesa sequenceof

@ " l d randomi.i.d. num-bers,eachuniformly over

� � �G� �{� .2. Alice randomlyreplaceshalf of these

@ " l d numberswith theirnegative values.

3. Alice “splices”l d zeroesinto randompositionsof theabovesequenceof

@ " l d numbers,resultingin anew sequence� of

@numbers.

4. Alice thensends� to Bob.

5. For eachnumbero from � , if o � � , Alice sends1 to Ursula;if oi�� thenAlice sends1 to Ursulaif* o *��aL d andsends0 otherwise;if oD�a� thenAlice sends0 to Ursula if

* o *s��L d andsends1otherwise.

6. For eachnumbero from � , if o � � , Bob sends0 to Ursula;if oQ�� thenBob sends1 to Ursulaif* o *J� M d andsends0 otherwise;if ov�S� thenBobsends0 to Ursulaif* o *4� M d andsends1 otherwise.

7. Ursulaincreases\u�q� 7 ��j by 1 if thevaluesshereceivesfrom Alice andBob aredifferent.

8. Ursulacomputeso \Z� j�� w\Z�q� 7 ��j 9�� , which is anunbiasedestimateof* L d "DM d *~n l d 9�� .

Becauseofl d , Ursuladoesnotknow theactualdistancebetween

L d andM d , andbecauseof thenegative

numbersamongthose@

randomnumbers,UrsulacannotfigureoutwhetherL d � M d or

L d � M d .Now, let us seehow to usethe above protocol to compute�0�d �� * L d "SM ��} d * , where

L � L � �G�G� L � and)�<�1M,�8}~��G�G�~M,�8}� :

1. Alice generatesa randomnumberl

.

2. For each�� y �

, suppose��1M,�8}~��G�G�~M,��}

� andrepeatthenext threesub-steps:

(a)\Z�q� 7 ��j � � .

(b) For each � ��|#��G�G�G� 7, Alice, Bob andUrsulausethe above protocol to compute

* L d "SM �8} d * .Therandomnumbers

l �8}~� ��G�G�G� l �8}� usedin theabove protocolaregeneratedby Alice, suchthat

� � d �� l �8} d � l .

(c) Ursulacomputeso \Z� j�� =\Z�q� 7 ��j 9�� , which is anunbiasedestimateof �D�d ��* L d " M ��} d *,n

� � d �� l �8} d 90�� = � � d ��* L d "DM �8} d *qn l 90�� .

3. Ursulacomputeso \Z� j��k �w�f�� o \u� j�� , andsendso \Z� j��k to Alice.

4. Alice computeso \Z� j�� o \Z� j�� k " l 9D�� andgetstheclosestmatchbetweenL

andany)� y �

.

Thecommunicationcomplexity is 6 �87�9C@¡9�; &.

10

4.2.3 PIM/A pprox/E

protocol

If the alphabetis predefinedand its size is finite, we cansolve a generalproblem–computingE �8L d ��M d & .

However, we cannotdirectly usethis protocol7

times to compute � � d �� E �8L d ��M d & becausethat wouldrevealeachindividual

E �8L d ��M d & result.Wewill presenttheprotocolfor computingE �8L d ��M d & here,andthen

in thefollowing sub-section,we will discusshow to useit asa building block to compute�D�d ��FE �8L d ��M d &without revealingany individual

E �8L d ��M d & .SupposeAlice hasaninput

L d ; BobhasaninputM d ; Alice wantsto know theresultof

E �8L d ��M d & withoutrevealing

L d andthe result to Bob, andBob doesnot want to reveal hisM d to Alice. After presentinga

solutionto thisproblem,we lateruseit asabuilding block to constructsolutionsto otherproblems.

E-function Protocol

Weassumetheencryptionmethodsusedbelow arecommutative.

1. Bob computesE �!¢ ��M d & for each

¢ � y¤£, where

£is thefinite (known) alphabet.Let

Obethesize

of£

.

2. Bob choosesa secretkey � , computes¥ d � E �!¢ �)��M d &�& for each¢ � yx£

, andsendsto Alice theO

results.

3. Alice choosesonefrom ¥ d � E �!¢ �)��M d &�& , W �¦|R�� O, suchthat

¢ �{� L d . This canbe donebecauseBob sentthe

Oencryptedresultsin order.

4. Alice choosesa secretkey � k , computes¥ du§ � ¥ d � E �8L d ��M d &�&�& , andsendsit backto Bob.

5. Becauseof thecommutativepropertiesof ¥ d § and¥ d , ¥ du§ � ¥ d � E �8L d � M d &�&�& isequivalentto ¥ d � ¥ du§ � E �8L d ��M d &�&�& ,whichcouldbedecryptedto ¥ du§ � E �8L d ��M d &�& by Bob. Bob sendstheresult ¥ du§ � E �8L d ��M d &�& to Alice.

6. Alice getsE �8L d ��M d & by decrypting¥ d § � E �8L d ��M d &�& .

Thetechniqueusedabove is similar to thestandardoblivioustransferprotocol;it protectstheprivacy oftheinputsfrom bothpartieswithout introducinga third-party. Thecommunicationcostis 6 �8O & , where

Ois thesizeof thealphabet.

PIM/A pprox/E Protocol

Now, let us seehow to securelycompute�f�� D�d �� E �8L d ��M �8} d &�& . As we discussedabove, we cannot

run theaboveE

-functionprotocol7

timesto get �D�d �� E �8L d ��M �8} d & . In thefollowing protocol,we will useadisguisetechniqueto hideeachindividual resultof

E �8L d ��M �8} d & .For each

)��wM,�8}~��G�G�~M,��}� , andfor each� �|#��G�G�G� 7 , let

E ��} d �8L d ��M ��} d &�� E �8L d ��M �8} d & n l �8} d , wherel �8} d is

a randomnumber, thefollowing protocolshows how A andB calculate�f�� d �� E �8L d ��M ��} d & :

1. Bob generatesa randomnumberl

thensendsl

to Alice.

2. For each��1M,�8}~��G�G�G��M,�8}

� , repeatthenext fivesub-steps:

(a) Bob constructsE �8} d �8L d ��M �8} d &:� E �8L d ��M ��} d & n l �8} d for � �¨|#��G�G�G� 7

, wherel �8}~��G�G�G� l �8}

� are7

randomnumbers.

(b) Alice andBobusetheE

-functionprotocolto computeE ��} d �8L d ��M �8} d & , for each� �©|#��G�G�G� 7 .

11

(c) Alice sends� � d �� E �8} d �8L d ��M ��} d & to Ursula.

(d) Bob sends�D�d �� l �8} d " l to Ursula.

(e) Ursulacomputeso \Z� j�� = �D�d ��FE �8} d �8L d ��M �8} d & " � �D�d �� l �8} d " l & = �D�d ��NE �8L d ��M �8} d & n l .

3. Ursulacomputeso \Z� j��k �w�f�� o \u� j�� , andsendso \Z� j��k to Alice.

4. Alice computeo \u� j�� o \u� j�� k " l , thusgettingtheactualdistancebetweenL

andtheclosest �

inthedatabase

�.

AlthoughAlice knows eachindividualE �8} d �8L d ��M �8} d & , shedoesnot know theactualvalueof

E �8L d ��M ��} d &becauseof

l ��} d . Similarly, becauseofl

, Ursuladoesnot know theactualscoreof theclosestmatch.Thecommunicationcostof theprotocolis 6 �8Oª9-7Q9�; &

, whereO

is thesizeof thealphabet,7

is thelengthofeachpattern,and

;is thesizeof thedatabase.In many cases,

Ois quitesmall. For instance,

Ois four in

DNA databases.Because

* L d "iM d * , �8L d "QM d & ( andHJ�8L d ��M d & functionsarespecialcasesof

E �8L d ��M d & , PIM/Approx/(Abs,Squ,

H) problemscanall besolvedusingtheabove protocol.

4.3 PIMPD/A pprox

Theonly differencebetweenthePIM modelandthePIMPD modelis that,in thelatter, Bob doesnot needto keepthe databasesecretfrom Alice. Therefore,all solutionsin the PIM modelcanbe appliedto thePIMPDmodelaswell. Whetherthe“public” featureof thedatabasecanresultin moreefficient solutionsisaninterestingquestion.Althoughwe do not yethave ananswerto it, weobservedthefollowing:

Observation 1. There is no secure two-partynon-interactivesolutionfor thePIMPD/Approxproblem.

Proof. A two-partynon-interactive protocolmeansBob,by himself,is ableto find theitem in thedatabasethathasminimal distancefrom thequery.

Assumethereis a two-partynon-interactive protocol « which solvesany of thePIMPD/Approxprob-lems,in anotherwords,given anencrypted/disguised form ( �qk ) of a query � , andthedatabase

�thatBob

knows, Bob canfind theitem in thedatabasethathasminimal distancefrom � asfollows. We use « � �¬� � k &to representthealgorithmon input

�and �qk .

SinceBobcanuseany databasehewants,hecanuseadatabaselike this:� k = � “axxxxxx”, “bxxxxxx”,

..., “zxxxxxx”�, supposingthat thealphabetis a setfrom ’a’ to ’z’. After applying « � � k � � k & , Bob will get

onethathastheminimaldistancefrom � . For instance,if “mxxxxxx” is theresult,Bobknowsthat’m’ is thefirst characterin � . Since « is a non-interactive protocol,Bob canreuseit on anotherdatabaseconstructedfor thepurposeof exposingthesecondcharacterin � ; hecankeepdoingthis andfigureout therestof thecharactersin � .

Therefore,if suchaprotocolexisted,thequery � wouldnotbekeptsecretfrom Bob.

Theaboveobservationdoesnot ruleout theexistenceof anefficient interactiveprotocolor amulti-partyprotocol.

4.4 SSO/Approx

In thismodel,Bob is aserviceproviderwhoprovidesstorageanddatabasequeryservicesto Alice. Accord-ing to Alice’s privacy requirement,Bob shouldknow nothingaboutthe databasethat he storesfor Alice,nor shouldheknow thequery. SoBobhasto conductadisguiseddatabasequerybasedon theencryptedordisguiseddataof Alice.

12

TherequirementthatBob shouldnot know thequeryresult,asin thePIM andPIMPD problem,is nolongerneededin theSSOproblem.Thereasonis thatBob doesnot know thecontentsof thedatabase,hedoesnot evenknow whatthedatabaseis for, sothatknowing whetherAlice’s queryis in thedatabasedoesnotdiscloseany secretinformationto Bob.

Intuitively, it canlook like that theSSO/Approxproblemmight bemoredifficult thanthePIM/Approxproblembecausein the latter Bob at leastknows the contentsof the databasewhereasin the former heknowsnothingaboutthedatabase.But knowing thecontentsof thedatabasehasadisadvantage,in thatBobcannotknow an intermediateresultbecauseheknows oneof the inputs(thedatabase);if healsoknew anintermediateresult,hemightbeableto figureouttheotherinput(thequery)of thecomputation.However, intheSSO/Approxproblem,Bobknows nothingaboutthedatabase,soit is safefor him to know intermediateresultswithout exposingthesecretquery.

WhetherBobcanknow intermediateresultsis acritical issuefor reducingthecommunicationcomplex-ity. If heknew intermediateresultsto someextent,hecouldconductthecomparisonoperationto find theminimalor maximalscore;otherwise,hehasto turn to Alice in orderto find theminimalor maximalscore,which resultsin high communicationcostin thePIM problem.

TheSSO/Approxproblemis similar to thesecureoutsourcingof scientificcomputationsproblemsstud-ied by Atallah et al. [3]. Thedifferenceis that in secureoutsourcingproblems,the inputsareprovidedbyAlice every time a computationis conductedat Bob’s side;therefore,Alice canencrypt/disguisetheinputsdifferently in different roundsof the computation.However, in the SSOproblem,oneof the inputs (thedatabase)is encrypted/disguised only once,andthis sameinput is usedin all roundsof computations;thismakestheproblemmoredifficult.

Sofar, we have a solutiononly for SSO/Approx/Squproblem.Thesolutionworksfor bothinfinite andfinite alphabets.

4.4.1 SSO/Approx/SquProtocol

SupposethatAlice wantsto outsourceherdatabase�� G�G�G��]�v�

to Bob, andwantsto know if querystring

L � L � �G�G� L � matchesany pattern �

in thedatabase�

.Thestraightforward solutionwould beto let Bob sendthewholedatabasebackto Alice, andlet Alice

conductthe queryby herself. Although this solutionsatisfiesthe privacy requirement,muchbettercom-municationcomplexity canbeachieved. Anotherintuitive questionwould bewhetherBob canconductthematchingindependentlyafter Alice sendshim the relevant informationaboutthe query. If the answeristrue,Bob shouldbeableto find the item

�thathastheclosestmatchto thequery

L. In anotherwords,if)��.Mg��G�G�~M

� and o \Z� j�� d ��8L d "VM d &)( , thenBob shouldbeableto find theminimumvalueof o \Z� j,� � .However, becauseof the privacy requirement,Bob is not allowed to know the actualquery

L, nor is he

allowed to know thecontentof thedatabase,so how doeshe computethe distanceo \u� j�� betweenL

andeachof theelement

)�in thedatabase?

Theideabehindoursolutionis basedonthefactthat eLbh e�,® � � eLU¯±° � & h��T¯ e�#® & , where¯

is aninvertiblematrix. Alice canstore

¯ e� ® insteadof e� ® atBob’s site,andkeeps secretfrom Bob. Shewill send eL<¯ ° �to Bob eachtime shewantsto senda query

L; thereforeBob cancompute eL%h e� ® without evenknowing eL

and e� . If we canuse eL�h e� ® to representthe � � d ��8L d " M d &)( , we canmake it possiblefor Bob to conducttheapproximatepatternmatching.

For each��²�ªM,�8}~��G�G�~M,��}

� in thedatabase�

, let e)�C� � �0�d �� MJ(�8} d n l " l �)��M,�8}~�Z��G�G�G��M,�8} ��|#� l ��&

, andlet

eL � � |#��"{z L ��G�G�G�p"{z L �� l´³ ��|�&

, wherel

,l�³

andl �

arerandomnumbers.Wewill have eL¬h e ® � = � � d �� MJ(�8} d"{z � � d ��gL d M �8} d n l n l�³ , andthereforeo \u� j�� = � � d ��8L d "QM �8} d &)( = eL{h e ® � nt� � � d ��gL (d " l " l�³ & . Since( � � d �� L (d " l " l�³ & is a constant,it doesnot affect the final result if we only want to find the

)�that

producestheminimum o \Z� j�� . Therefore,Bobcanuse eLQh e ® � to computetheclosestmatch.

13

Beforeoutsourcingthedatabaseto Bob, Alice randomlychoosesa secret�87?n2µ &¬¶ �87�nSµ &

invertiblematrix

¯, andcomputese� �� ¯ e ® � , thensends

� k �x� e� ��G�G�G� e� �b� to Bob.

Protocol

1. For any querystringL � L � �G�G� L � , Alice generatesa randomnumber

l ³, andconstructsa vector

eL � � |#��"{z L ��G�G�G��"{z L �� l�³ ��|�&

, thensendseLU¯ ° � to Bob.

2. Bob computeso \Z� j�� k� � eLQh e� ®� , for W ��|#��G�G�G� ; .

3. Bob computes�f�� o \Z� j�� k� , andgetsthecorrespondingW .

4. Bob returns e� � to Alice.

5. Alice computes° � e� � andgets

)�, which is theclosestmatchof herquery.

BecauseAlice andBob areinvolved in only oneroundof communication,the communicationcostis6 �87 & .

Notice that we have introducedrandomnumbersl

,l ³

,l �

for W �c|#��G�G�G� ;. The purposeof

lis

to prevent Bob from knowing theactualdistancebetweenL

andthe itemsin thedatabase;thepurposeofl ³is to preventBob from knowing therelationshipbetweentwo differentqueries;thepurposeof

l �is to

preventBob from knowing therelationshipamongitemsin thedatabase.Withoutl �

, two similar itemsinthedatabase

�would still besimilar to eachotherin thedisguiseddatabase

� k ; addinga differentrandomnumberto eachdifferentitemwill make thissimilarity disappear.

4.5 SSCO/Approx

This modelposesmorechallengesthantheSSOmodelbecaseBob could now colludeagainstAlice witha client, or he caneven becomea client. Therefore,oneof the threatswould be for Bob to compromisethe privacy of the databaseby conductinga numberof queriesandderiving the way the databaseis en-cryptedor disguised.A secureprotocolshouldresistthis typeof active attack.We have a solutionfor theSSCO/Approx/Squproblemthatworksfor bothinfinite andfinite alphabets.

4.5.1 SSCO/Approx/Squprotocol

One of the differencesbetweenthe SSCO/Approxproblemand the SSO/Approxproblemis who sendsthe query. In the SSO/Approx/Squprotocol,Alice transformsthe query

Lto a vector eLU¯ ° � , andsends

the vector to Bob; in the SSCO/Approx/Squprotocol, the client Carl will sendthe query. BecauseCarldoesnot know

¯, he cannotconstruct eLU¯ ° � by himself. If Carl could get the resultof eLU¯ ° � securely,

namelywithoutdisclosing eL to Alice andwithoutknowing¯

of course,wewouldhaveasolution.Because¯ ° � � � e� ® � ��G�G�G� e� ®· & , computing eLU¯ ° � securelyis basicallya taskof computing eL¤h e� ®d for � ��|R�� O,

whichcanbesolvedusingthesametechniqueasthatusedin solvingPIM/Approx/Squproblem.Therefore,by modifying step2 of theSSO/Approx/Squprotocolslightly, andalsoby usinga form of

“l�¸ 9:� o \Z� j�� n l ³ & ”, insteadof the form of “ o \Z� j�� n l ³ ” asis usedin SSO/Approx/Squprotocol,we

obtainaSSCO/Approx/Squprotocolasthefollowing:Let

�ª�¹�� G�G�G�� be thedatabaseAlice wantsto outsourceto Bob, andassumethe lengthof each

string)�

is7

. Alice generates;

randomnumbersl ��G�G�G� l �

. For each��0�¡M,�8}~��G�G�G��M,�8}

� , let e)�¤�� D�d �� M (�8} d n l " l ��M,��}~��G�G�G��M,�8} ��|#��|#� l ��&

; let e� �� ¯ e ® � , where is arandomlygenerated�87Cnfº &�¶ �87Cnfº &

matrix.In whatfollows,we assumethatAlice outsourcedthedatabase

� k �x� e� ��G�G�G� e� �b� to Bob.

14

Protocol

1. Whenever a client Carl wantsto conducta searchon queryL � L ��G�G� L � , he generatesa random

numberl�»

.

2. Alice generatesrandomnumbersl�³

andl ¸

.

3. Carl and Alice jointly compute e� � l ¸ eLU¯ ° � , where eL � � |#��"{z L ��G�G�G�-"´z L �� l`» � l�³ ��|�&

. Thecomputationdoesnot reveal Alice’s secret

¯,l�³

orl ¸

to Carl, nor doesit reveal Carl’s privatequery

Lorl`»

to Alice.

4. Carl thensendsthevector e� to Bob.

5. Bob computeso \Z� j�� e� h e� ®� � l ¸ � �D�d �� M (�8} d "�z ��d �� L d M ��} d n l`» n l�³ &6. Bob returnsto Alice o \u� j��k �w�f�¼� ��G�� o \Z� j�� .7. Alice computeso \Z� j�� k k = ½K¾K¿�À�Á §ÂNÃ " l�³

= �0�d �� M (�8} d "Vz �D�d �� L d M ��} d n l`» andsendsit to Carl.

8. Carl computeso \Z� j�� o \Z� j��k k n �D�d ��gL (d " l�» , which is theanswerheseeks.

Becauseofl`»

, Alice cannotfigureout theactualscorefor thisquery, andbecauseofl�³

andl ¸

, Carlcannotfigureout theactualscorebetweenhisqueryandotheritemsin thedatabase(exceptfor thematchedone),evenif Carl couldcolludewith Bob. Thecommunicationcostof theprotocolis 6 �87 ( & , mostof whichis contributedby thecomputationof

l�¸ eLU¯ ° � in step3.

5 Doing Without the Third Party

In this sectionwe presenttechniquesfor avoiding the useof the third party Ursula. We illustrate theseby giving a 2-party(without Ursula)solutionto thePIM/Approx/Squproblem. Thereadercanverify thatthey canalsoget rid of Ursulafor theotherprotocolsaswell. Thesolutionsthatusethesetechniquesare,however, lesspracticalthanthoseof theprevioussection(thatusedUrsula)mostlybecauseof therepeateduseof Yao’s millionaire protocol[34] (recall that, in this protocol,Alice andBob have a numbereachandseekto determinewhich onehasthe larger numberwithout revealinganything elseabouttheir respectivenumbers).If apractical2-partyprotocol(i.e.,withoutanUrsula)to Yao’smillionaireproblemis ever found,thenthetechniqueswe describein this sectioncouldbecomemorepractical.Anothercostincurredhereisthatof anoblivioustransferprotocol,in whichBobhas

;itemsandAlice is to receive oneof themwithout

Bobknowing whichoneshereceived[29, 26].Recallthat,in thePIM/Approx/Squproblem,Bob hasa database

�Y�� e ��G�G�G� e]�b� , wherethelengthofeachstring e)� is 7 , andAlice wantsto know the e�� y � thatmostcloselymatchesaquery eL � L �Z�G�G� L � basedon thePIM/Approx/Squmetric.

Thenew protocolconsistsof two steps:

1. Usingthe2-partymodifiedscalarproductprotocol(givenbelow), Alice obtainsN of theform scalars � � e � h eLin $ � , where$ �

is a randomscalarknown to Bob only andwhosepurposeis to hide e � h eLfrom Alice (who is supposedto learnonly thesmallestof the e)� h eL products).After this,Alice hasavector e � �! �� & andBobhasavector e$²� � $��/$Z��& . Notethattheproblemof computingthesmalleste)� h eL is thesameasthatof computingthesmallestelementin thevector eo � e " e$ . Alicehasto know thissmallestelementwithoutBob learningtheanswer(or thepositionÄ in eo atwhichtheminimumoccurs).

2. Usingthe“find minimum” protocolgivenbelow, Alice (but notBob)getstheminimumelementof eo .15

5.1 Protocol for Modified Scalar Product

Alice hasa vector eL , Bob hasa vector eM anda scalar$, andthe goal is for Alice (but not Bob) to know

eLQh eM n $ , without Alice learninganythingnew (otherthanwhatcanbeinferredfrom eLQh eM n $ ).Alice could sendÅ vectorsto Bob, with oneof themequalto eL and the othersbeing “f ake” vectors

(structuredlike eL , e.g.,randomif eL is random,etc).ThenBobcomputesthemodifiedscalarproduct em h eM n $for eachof theseÅ vectors em , andAlice thenusesaoblivioustransferto retrieve theanswerthatcorrespondsto thetrue eL . However if eL hascertainpropertiesknown to Bob, hemight beableto differentiate eL out ofthe other Å "Y| seeminglysimilar “f ake” vectors. A “one out of p” degreeof uncertaintyfor Bob is alsounacceptablysmall.

Theabove drawbackscanbe fixedby dividing vector eL intoO

randomvectors e��Z�� e� · , with eL �� ·Æ �� e� Æ . Thesamemethodasdescribedabovecanthenbeusedto let Alice know e� Æ h eM n j Æ , wherej Æ is arandomnumber, andhence� ·Æ �� j Æ �Y$ (seeFigure2). Certainly, thereis a

|out of Å possibilitythatBob

canguessthecorrect e� Æ , but since eL is thesumofO

suchrandomvectors,thechancethatBob guessesthecorrect eL is

|outof Å · , which is very smallif wechooseÅ · largeenough.

x = v1+v2+v3+v4

v3v2 v4v1

v1 y+r1, v2 y+r2,

v3 y+r3,

v4

v1

v2

v3

Oblivious Transfer

hinding v1,v2,v3,v4

Alice Bobprivate input: x private input y

among random numbers

v4 y+r4

Alice gets: x y + r = (v1 y + r1) + (v2 y + r2) + (v3 y + r3) + (v4 y + r4)

Figure2: VectorProductProtocol

Protocol

1. Alice andBob agreeon two numbersÅ andO

, suchthat Å · is solargethatconductingÅ · additionsis computationallyinfeasible.For example,Alice andBobcouldchooseÅ = 2 and

O= 1024.

2. Alice generatesO

randomvectors e�� e� · , suchthat eL � � ·Æ �� e� Æ .3. Bob generates

Orandomnumbersj �� j · , suchthat

$²� � ·Æ �� j Æ .4. For eachÄ �©|#�� O , Alice andBob conductthefollowing sub-steps:

(a) Alice sendse^<� , �� , e^�Ç to Bob,wherefor asecret� , e^ d � e� Æ andtherestarerandomvectors.

(b) Bob computese^U� h eM n j Æ , �� , e^�Ç h eM n j Æ .(c) Alice retrievesthe � th elemente^ d h eM n j Æ � e� Æ h eM n j Æ from Bob usingoblivioustransfer.

16

5. Alice computes � � ·Æ �� e� Æ h eM n j Æ &=� eLQh eM n $ .

Alice preservesher privacy by dividing the vector intoO

randomcomponentseachof which is itselfhiddenusing Å randomvectors,andby privately retrieving the results. Bob is protectedby the randomnumbersj Æ whichpreventAlice from computing eM usinga linearsystemof equations.

As in the PIM/Approx/SquproblemAlice andBob run this protocol for all �ty �

. Alice andBobthensharethe resultsfor thescoresin thevectors e and e$ , respectively. This protocolhascommunicationcomplexity 6 � Å 9-O�9�7�9�; &

.

5.2 Find Minimum Protocol

Thegoalis for Alice (but notBob)to find theminimumelementof vector eo � � o �� o �´& where eo � e " e$ ,Alice has e � �! �� ´&

, andBob has e$È� � $��/$Z�{&. NeitherAlice nor Bob shouldlearnorder

informationaboutthe o � ’s (e.g.,the o � ’ssortedorder, or eventhat o�É��1o�Ê ). Thefollowing protocolis badinthatit illegally revealssuchorderinformationbetweenthe o � ’s to bothAlice andBob,but otherwiseit doessucceedin letting Alice know the minimum elementvalue o d , without revealingthat o d to Bob (althoughBobdoesfind out � ):Preliminary (bad) protocol for finding the minimum

1. Alice mimicsthestandardminimumalgorithmand,whenever thatalgorithmrequiresthat o � becom-paredto o Æ , Alice givesBob theorderedpair W � Ä andthey do thefollowing:

(a) Alice computes�� " Æ � o ��" o Æ n $ ��"2$ Æ , andBob computesr �P$ ��"2$ Æ . (Note that

o � �>o Æ if f� �2r .)

(b) They comparetheir respective values�

and r usingYao’s millionaire protocol[34].Note. This canbe madean asymmetricversionof Yao’s millionaire problem,in which Alicebut not Bob knows theoutcomeof thecomparison,but Bob neverthelessfindsout theoutcomebasedonhisobservationof futurepairsof indicesthatAlice will sendhim (if o � �1o Æ thenindexW will reappearbut not index Ä ).

2. Let � betheindex of theminimum.Alice obtainsthedesiredo d by getting$ d from Bob,e.g.through

1-out-of-;

oblivioustransfer[26].Note. Wewill explainbelow how wecanpreventBobfrom learning� throughhisobservationsof thepairs W , Ä thatAlice sendshim. Furthermore,oblivioustransferwill thenno longerbeneededto hide� from Bob.

As notedearlier, theaboveprotocolis badbecauseit revealstoomuchinformationduringthecomputa-tion of theminimum.Beforerunningtheabove protocolAlice andBob,usinga randompermutationË thatis unknown to bothAlice andBob,couldpermutetheentriesof e , andin thesameway permutetheentriesof e$ (in effect implicitly permutingtheentriesof eo ). But thatwouldnotbeenough;eitheroneof themcouldfind the secretpermutationby comparingthe elementsof their permutedvectorto thoseof their originalvector. In order to prevent this, they have to be also “blinded” by addinga randomnumbersj � to eachelementand

�and

$ �, where j � is alsounknown to bothAlice andBob. Theway we achieve the random

permutationeffect is by usinga permutationË that is thecompositionof two randompermutationsË ³ andËIÌ , wheretheformer is known to Alice (but not Bob), andthelatter is known to Bob (but not Alice). The“additive blinding” of

�and

$/�is by usingan j �� j#k� n j#k k� wherej#k� is known to Alice (but notBob),and j#k k�

is known to Bob (but not Alice). Thepermutationandblinding aredonefirst by Alice, thenagainby Bob.

17

To performadditive blinding, we needa specialpublic key cryptosystemthat hasthe following property:¥ d �8L & h ¥ d � M�&´� ¥ d �8L%n MN& . Suchsystemsarecalledhomomorphiccryptosystemsandexamplesincludethesystemsby Benaloh[4], NaccacheandStern[25], OkamotoandUchiyama[27], andPaillier [28].

Protocol for permuting and additive blinding

Inputs:Alice has e , Bobhas e$ , Alice hasarandompermutationË ³ andarandomvector ej k known to herbutnot to Bob.Output: Bobobtainsthesetof

;valuesof theform

$ � n j k� in a permutedorderaccordingto Ë ³ (whichhedoesnot know). (Note thatAlice trivially hasthesetof

;valuesof the form

� n jqk� in a permutedorderaccordingto Ë ³ , becausesheknows Ë ³ and ej k & .

1. Alice andBob eachgeneratea key pair for a homomorphicpublic key systemandexchangetheirpublickeys. In whatfollows ¥ ³ �)h & denotesencryptionwith Alice’spublickey, and Í ³ �)h & decryptionwith Alice’s privatekey (similarly for ¥ Ì �)h & and Í Ì �)h & ).

2. Bob encryptseachentry� $ � ��/$ � &

usinghis public key andsends e$ k � � ¥{Ì � $ � & �� ¥{Ì � $ � &�& toAlice.

3. Alice doesthefollowing:

(a) ShecomputesÎ ��.$ k� h ¥ Ì � j#k� &=� ¥ Ì � $ � n jqk� & , for W ��|#�� ; .

(b) Shepermutes,accordingto Ë ³ , theorderof the Î � ’s: Let e$ k k denotethevectorof permutedÎ � ’s.

(c) Alice sends e$ k k to Bob.

4. Bob decryptstheentriesof e$ k k , obtainingthesetof;

valuesof theform$/� n j k� in a permutedorder

accordingto Ë ³ (which he doesnot know). Note that Alice trivially hasthesetof;

valuesof theform

� n j#k� in a permutedorderaccordingto Ë ³ (which shedoesknow).

SupposeAlice andBob run the above “permuteandblind” protocolonce,andthenfollow it with theearlier-givenpreliminary(“bad”) “find minimum” protocolin which thepermuted-and-blindeddatais usedinsteadof theoriginal vectors e and e$ . Unlike before,Bob now learnsnothingof the relative orderingsofthe o � ’s, anddoesnot learntheindex � for which o d is minimum. But Alice still learnstoo much(becausesheknows both Ë ³ and ej k ). This is easilyfixed: After runningtheabove “permuteandblind” protocol,werun it again,on thealreadypermuted-and-blinded data,but with rolesof Alice andBob interchanged.Asa result,Alice andBob endup with doubly-permutedanddoubly-blindeddata:Permutedaccordingto Ë ³followedby Ë Ì , thecompositionof which is unknown to bothof them(becauseAlice knows only Ë ³ andBob knows only Ë Ì ), andadditively blindedaccordingto a randomvectorthat is unknown to bothof thembecauseit is thesumof two randomvectorsthefirst of which is known only to Alice andthesecondonly toBob. In otherwords,Bobnow hasa randompermutationË of thesetof

;values

$ � n�Ïj � , andAlice hasthesamepermutationË of thesetof

;values

� nxÏj � , andneitherAlice norBobknow Ë orÏj � . It is finally safe

to usetheearlier-given(andno longerbad!) preliminaryprotocol.Thisprotocolhascommunicationcost 6 �!; & 9=Ð´n 6 �!; & 9�Ñ , where

Ðis thecostof encryptingavector

elementusinghomomorphicciphersandÑ

is thecostof performingaYao’smillionaireprotocolonavectorelement.Also,becauseweusepublic-key cryptosystems,probabilisticencryptionfor vectorelementswhosedomaincouldbeexhaustively searched,might beneeded,which further increasesthecommunicationcostfor thehomomorphicencryption.

18

6 Conclusionand Futur e Work

We have developedfour modelsfor secureremotedatabaseaccess,andpresenteda classof problemsandsolutionsfor thesemodels.For someproblems,suchasSSO/Approx/SquandSSCO/Approx/Squproblems,oursolutionsarepractical,andthey only need6 �87 & and 6 �87 ( & communicationcost,respectively; while forPIM/Approx and PIMPD/Approx problems,our resultsare still at the theoreticalstagebecauseof theirhigh communicationcost. Improving thecommunicationcostfor thosesolutionsis oneavenuefor futurework: We suspectthat,whenever thereis a dependenceon

;, that dependencecould be madesub-linear

(perhapslogarithmic)by combiningour methodswith the known powerful higherdimensionalindexingtechniques[31, 7, 2, 19, 32, 15]. However, combiningthoseschemeswith ourprotocolswill notbea trivialtask,andtheincreasein theconstantfactorshidingbehindthe“big-oh” notationmaywell negatethebenefitsof theasymptoticsub-linearityin

;; for example,in a treesearchfor processingthequery, Bob hasto be

preventedfrom knowing what nodesof his treearevisited whenprocessingthe query(otherwisehe getsinformationaboutthequery),whichrequiresusingaPIR-likeprotocolateachnodedown thetree.But eventhat is not enough:Alice herselfmustbepreventedfrom learninganything aboutBob’s dataotherthantheanswerto herquery, but in mostof thetree-basedschemesin theliteraturethecomparisonat a nodeof thesearchtreegivesinformationaboutthedatathat is associatedwith thatnode(theseschemesweredesignedfor an environmentwherethesearcheris theowner, andmay requiresubstantialmodificationbeforetheyareusedin ourcontext).

Anotheravenuefor futurework is thepatternmatchingof branchingstructures:Thepatternmatchingproblemsthat we have discussedonly involve patternsof simple linear structure;in many applications,patternshave a branchingstructure,suchas a tree or a DAG. The [ /Approx/Edit/Tree problemin ourmodelis oneof theexamples.Developinga secureprotocolto dealwith this typeof queryis a challengingproblem.

References

[1] A. ApostolicoandZ. Galil, editors.PatternMatchingAlgorithms. Oxford UniversityPress,1997.

[2] S.Arya. Ph.dthesis:Nearestneighborsearchingandapplications.TechnicalReportCS-TR-3490,UniversityofMarylandat CollegePark,June1995.

[3] M. Atallah andJ. Rice. Secureoutsourcingof scientificcomputations.TechnicalReportCOAST TR 98-15,Departmentof ComputerScience,PurdueUniversity, 1998.

[4] J.Benaloh.Denseprobabilisticencryption.In Proceedingsof theWorkshoponSelectedAreasof Cryptography,pages120–128,Kingston,ON, May 1994.

[5] B. ChorandN. Gilboa. Computationallyprivateinformationretrieval (extendedabstract).In Proceedingsof thetwenty-ninthannualACM symposiumon Theoryof computing, El Paso,TX USA, May 4-61997.

[6] M. CrochemoreandW. Rytter. Text Algorithms. OxfordUniversityPress,1994.

[7] R. Agrawal, C. FaloutsosandA. Swami. Efficient similarity searchin sequencedatabases.In Proceedingof theFourth Int’l Conferenceon Foundationsof Data OrganizationandAlgorithms, October1993. Also in LectureNotesin ComputerScience730,SpringerVerlag,1993,69–84.

[8] B. Chor, N. Gilboa andM. Naor. Private informationretrieval by keywords. TechnicalReportTR CS0917,Departmentof ComputerScience,Technion,1997.

[9] O. Goldreich. Secure multi-party computation (working draft). Available fromhttp://www.wisdom.weizmann.ac.il/home/oded/public html/foc.html,1998.

[10] P. F. Syverson,D. M. GoldschlagandM. G. Reed.Anonymousconnectionsandonionrouting. In Proceedingsof 1997IEEESymposiumon SecurityandPrivacy, Oakland,California,USA, May 5-7 1997.

19

[11] S. Goldwasser. Multi-party computations:Pastandpresent.In Proceedingsof thesixteenthannualACM sym-posiumon Principlesof distributedcomputing, SantaBarbara,CA USA, August21-241997.

[12] Y. Gertner,S. GoldwasserandT. Malkin. A randomserver model for private information retrieval. In 2ndInternationalWorkshoponRandomizationandApproximationTechniquesin ComputerScience(RANDOM’98),1998.

[13] R. GonzaleziandR. Woods.Digital ImageProcessing. Addison-Wesley, Reading,MA, 1992.

[14] D. Gusfield.Algorithmson Strings,Trees,andSequences:ComputerScienceandComutationalBiology. Cam-bridgeUniversityPress,1997.

[15] A. Guttman. R-trees:a dynamicindex structurefor spatialsearching.In ACM SIGMODWorkshopon DataMining andKnowledgeDiscovery, pages163–174,Boston,MA, 1984.

[16] G. Di-Crescenzo,Y. Ishai andR. Ostrovsky. Universalservice-providersfor databaseprivateinformationre-trieval. In Proceedingsof the17thAnnualACM Symposiumon Principlesof DistributedComputing, September21 1998.

[17] Y. IshaiandE. Kushilevitz. Improvedupperboundson information-theoreticprivateinformationretrieval (ex-tendedabstract).In Proceedingsof thethirty-firstannualACM symposiumonTheoryof computing, Atlanta,GAUSA, May 1-4 1999.

[18] A. Jain.Fundamentalsof Digital ImageProcessing. PrenticeHall, EnglewoodCliffs, NJ,1989.

[19] J.Kleinberg. Two algorithmsfor nearest-neighborsearchin high dimensions.In Proceedingsof the29thACMSymposiumonTheoryof Computing, 1997.

[20] B. Chor, O. Goldreich,E. Kushilevitz andM. Sudan. Private informationretrieval. In Proceedingsof IEEESymposiumonFoundationsof ComputerScience, Mil waukee,WI USA, October23-251995.

[21] E.Kushilevitz andR.Ostrovsky. Replicationisnotneeded:Singledatabase,computationally-privateinformationretrieval. In Proceedingsof the 38th annual IEEE computersocietyconferenceon Foundationof ComputerScience, Miami Beach,FloridaUSA, October20-221997.

[22] Y. Gertner, Y. Ishai, E. Kushilevitz and T. Malkin. Protectingdataprivacy in private information retrievalschemes.In Proceedingsof the thirtieth annualACM symposiumon Theoryof computing, Dallas,TX USA,May 24-261998.

[23] C. Cachin,S. Micali andM. Stadler. Computationallyprivateinformationretrieval with polylogarithmiccom-munication. Advancesin Cryptology: EUROCRYPT’99, Lecture Notesin ComputerScience, 1592:402–414,1999.

[24] O. Goldreich,S. Micali andA. Wigderson.How to play any mentalgame. In Proceedingsof the19thannualACM symposiumon Theoryof computing, pages218–229,1987.

[25] D. Naccacheand J. Stern. A new cryptosystembasedon higher residues. In Proceedingsof the 5th ACMConferenceon ComputerandCommunicationsSecurity, pages59–66,1998.

[26] M. NaorandB. Pinkas.Oblivioustransferandpolynomialevaluation(extendedabstract).In Proceedingsof the31thACM SymposiumonTheoryof Computing, pages245–254,Atanta,GA, USA, May 1-41999.

[27] T. OkamotoandS.Uchiyama.An efficientpublic-key cryptosystem.In Advancesin Cryptology– EUROCRYPT98, pages308–318,1998.

[28] P. Paillier. Public-key cryptosystemsbasedon compositedegreeresidueclasses.In Advancesin Cryptology –EUROCRYPT99, pages223–238,1999.

[29] M. Rabin.How to exchangesecretsby oblivioustransfer. TechnicalReportTech.MemoTR-81,AikenCompu-tationLaboratory, 1981.

[30] M. K. ReiterandA. D. Rubin. Crowds: anonymity for webtransaction.ACM Transactionson InformationandSystemSecurity, 1(1):Pages66–92,1998.

20

[31] HananSamet.Multidimensionaldatastructures.In Mikhail J. Atallah, editor, AlgorithmsandTheoryof Com-putationHandbook, chapter18.CRCPress,1999.

[32] N. Beckmann,H-P. Kriegel, R. Schneider,B. Seeger. The r*-tree: An efficient androbust accessmethodforpointsandrectangles.In ACM SIGMODWorkshopon Data Mining andKnowledgeDiscovery, pages322–331,Atlantic City, NJ,1990.

[33] D. Song,D. WagnerandA. Perrig.Practicaltechniquesfor searcheson encrypteddata.In Proceedingsof 2000IEEESymposiumon SecurityandPrivacy, Oakland,California,USA, May 14-172000.

[34] A. Yao.Protocolsfor securecomputations.In Proceedingsof the23rd AnnualIEEESymposiumonFoundationsof ComputerScience, 1982.

21

protocols for secure remote database access with approximate...

Documents