revealing the causative variant in mendelian patient ...computation on encrypted data. these provide...
TRANSCRIPT
![Page 1: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/1.jpg)
1
TitleRevealingthecausativevariantinMendelianpatientgenomeswithoutrevealingpatientgenomes
AuthorsKarthikA.Jagadeesh1,5,DavidJ.Wu1,5,JohannesA.Birgmeier1,DanBoneh1,2,6,GillBejerano1,3,4,6
Affiliations1DepartmentofComputerScience,StanfordUniversity2DepartmentofElectricalEngineering,StanfordUniversity3DepartmentofDevelopmentalBiology,StanfordUniversity4DepartmentofPediatrics(MedicalGenetics),StanfordUniversity5Theseauthorscontributedequally6Correspondingauthors:[email protected](D.B)[email protected](G.B)
AbstractGiventherapidlygrowingutilityofcriticalhealthinformationrevealedinthehumangenome,securegenomiccomputationisessentialtomovingforward,especiallyasgenomesequencingbecomescommonplace.Wedeviseandimplementproof-of-principlecomputationaloperationsforpreciselyidentifyingcausalvariantsinMendelianpatientsusingsecuremultipartycomputationmethodsbasedonYao’sprotocol.Weshowmultiplerealscenarios(smallpatientcohorts,trioanalysis,twohospitalcollaboration)wherethecausalvariantisdiscoveredjointly,whilekeepingupto99.7%ofallparticipants’mostsensitivegenomicinformationprivate.Allsimilaroperationsperformedtodaytodiagnosesuchcasesaredoneopenly,keeping0%ofparticipants’genomicinformationprivate.Ourworkwillhelpusherinanerawheregenomescanbebothutilizedandtrulyprotected.
IntroductionRarediseasesaffect1in33babies.ExomeandgenomesequencinghaverevolutionizedthediagnosisofthousandsofrareMendeliandiseasestothousandsofdifferenthumangenes1–3.ThousandsofadditionalrareMendeliandiseasesandhumangenesawaitdiscovery.Frequency-basedfiltershaveprovenextremelyeffectiveinprovidingdiagnosisinsuchcases4.Inessence,variantsfoundinacontrolpopulation(commonvariants)arelikelytobebenign5whilefunctionalrarevariantsnotfoundinthecontrolpopulationbutseeninmultipleaffectedindividualsarelikelytobediseasecausing6–8.Thesefiltersseekthegeneorvariantpresentinall(most)affectedindividualsbutinno(veryfew)unaffectedindividuals.
Forexample,onecantakeasmallcohortofunrelatedindividualssuspectedofsufferingfromthesamegeneticdisorder,andcomparetheirgenomestothatoftensofthousandsofunaffectedindividuals(e.g.,fromtheexomeaggregationconsortium,ExAC5).Asweshowbelow,inmultiplescenarios,thegenewithrarefunctionalmutationsinmostpatientsinoursmallcohortsisindeedcausaloftheircondition. Frequency-basedcomputationhighlightsthefundamental“serveorprotect”dilemmaofgenomicdata:“Serve:”tofindtherootcauseofapatient’sdisease,onewishestocompareapatientgenometoasmanyothergenomesaspossible,bothaffectedandunaffected,relatedandunrelated.Thus,toadvancemodernmedicine,allsequencedgenomesshouldbeshared.“Protect:”one’sgenomecontinuestorevealmoreandmoreaboutoneself,includingsusceptibilitytoavarietyofdiseases9.
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 2: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/2.jpg)
2
Sharingitwithotherscanleadtodiscriminationandbias.Toprotectitsownerandnextofkin,nosequencedgenomeshouldbeshared. Todate,thisdilemmahasbeensolvedbyallowinginstitutionsunrestrictedaccesstoallthegenomesintheirpossession.Limitedsharingbetweeninstitutionsisdonebyprovidingobfuscatedsummarystatistics10.Currentcommonlyadoptedmethodsforsharinghaveshortcomingsthatmakethemsuboptimal.Providingfullaccessatindividualinstitutionsallowsfortoomuchinformationtobesharedincertainsituations11.Disease-specificbeaconsarepronetoattackandcanendupidentifyingindividualsparticipatinginthestudy12.Beaconsalsoonlyprovideallele-presencequerycapabilitiesanddonothavetheflexibilityneededforanalyzingmultifactorialvariantinteractionswithinanindividual13.Itisalsoriskytosharegenomicdataintheclearwiththird-partyservicesspecializingingenomicanddiseaseanalysis.Weareunawareofanycryptographically-securemethodforsharinggenomicdatatoperformcomputationaloperationsthatallowidentifyingcausalvariantsinpatients. Tobetterresolvethisdilemma,wefirstnotethatwhileallofthegenomicvariantsfromallindividualsareneededtoperformthecomputation,onlyahandfulofcausalvariantsareultimatelyofinterestinthecontextofMendelianpatients(intheexampleabove,justtherarevariantsinthesinglegenemutatedinmostpatients).
Weintroducehereamodern,proof-of-conceptcryptographicimplementationwhichbothservesandprotects.Thesecurecomputationcanberunonentiregenomes(Serve),whilenopartyinvolvedinthecomputationlearnsanythingabouttheinputsoftheotherparticipantsexceptfortheoutputwhichiscomputedtogether(Protect).Weuserealpatientdatatoshowthatoursecureimplementationrevealsminimalinformationwhilediagnosingpatientgenomesthrough3differentstrategiesusingpracticalamountsofcomputetimeandmemory.Cryptographicmethodshavebeenusedindifferentgenomiccontextssuchasmicrobiomeanalysis14,GWASanalysis15andgenomicalignment16,butthisisthefirstimplementationthatweareawareofthatisgearedtowardsdiagnosingMendelianpatients,atimelyandpotentneed.
MethodsRepresentinggenomicdataasvectorsAssumeeachindividualinvolvedinastudyhasprivateaccesstotheirexome(orgenome).Ifwearelookingtoidentifyacausalvariant,wedefineavariantvector(longlistoflength28,413,589)ofallpossibleraremissense/nonsensevariantsinthehumangenomefromthefirstgeneonchromosome1tothelastgeneonchromosomeY.Weprovideacopyofthisvectortoeachindividual(affectedandunaffected),andaskthemtoprivatelydenoteTrue/Falsenexttoeachvariant(toindicatewhethertheyhavethespecificmutationornot,respectively).Ifwearelookingtoidentifyacausalgene,weprovideeachindividualagenevectorof20,663genesinthehumangenomefromA1BGtoZZZ3.Weaskthemtowrite“1”nexttoageneiftheyhaveoneormorerarefunctionalvariantsinthisgene,andotherwise,theywrite“0”.SeeSupplementaryFigure1A,B.
Definingcomputationsofinterest(MAX,INTERSECTION,SETDIFF)Wedefinethreeoperationsusedforpatientdiagnosis(SupplementaryFigure1C).Imaginetwoaffectedindividualsarerepresentedbytworarefunctionalvariantvectors(True/Falselists).Intersectingthesetwovectorswillrevealalltherarefunctionalvariantstheyshare.Formally,weperformaBooleanINTERSECTION(orAND)operation(xANDy=Trueonlyifx=y=True,andotherwise,itisFalse)betweenallpossiblepatientvariants.Next,ifwealsohaveaccesstoanunaffectedfamilymember,we
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 3: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/3.jpg)
3
canfurtherexcludeanyvarianttheyshare.WedothiswithaBooleansetdifference(SETDIFF)operation(xSETDIFFy=Trueonlyifx=Trueandy=False,otherwiseitisFalse).Finally,imaginewehaveaccesstoasmallcohortofunrelatedpatientssharingasetofphenotypes.Wewouldliketofindthegeneaffectedbyoneormorerarefunctionalvariantsinthegreatestnumberofpatientswithinthecohort.Forthis,weusethepatientgenevectors(0/1lists).Wesum0/1sacrosspatientsforeachgene,andthenweusethemaximum(MAX)operationtofindtheentry(gene)withthegreatestnumber(ofaffectedcases;SupplementaryFigure1C).
Remarkably,moderncryptographyallowsanynumberofindividualstojointlylearnthefinalresultoftheseMAX,SETDIFF,INTERSECTIONoperationswithoutanyofthemlearninganythingelseabouteachother’sgenomes(orvectors).
EncryptionanddecryptionAnimportantcryptographicprimitivewerelyonisasecret-keyencryptionscheme.Inasecret-keyencryptionscheme,asecretkeyisusedtoencryptanddecryptmessageswiththeguaranteethattheencryptionsofanytwomessagesareindistinguishable,andyet,theycanbesuccessfullydecrypted(toobtaintheoriginalmessage)giventhekey(SupplementaryFigure2).
SecuremultipartycomputationMultiplemathematicalframeworksandcomputationalimplementationsexistforsecuremultipartycomputationonencrypteddata.Theseprovidedifferenttradeoffsincomplexityandefficiency17.Inthiswork,weuseYao’sprotocoltosecurelyevaluatefunctionsbetweentwoparties18.Abstractly,wewritethefunctionas!(#$, #&)where#$denotestheinputofthefirstpartyand#&denotestheinputofthesecondparty.Anyfunction! #$, #& canberepresentedbyacombinationofBooleanoperations(forexample,seeSupplementaryFigure3).Yao’sprotocolprovidesawayofevaluatingtheBooleancircuit(operator-by-operator)withoutrevealingtheinputs#$,#&.WeillustratethisindetailinFigure1.
WhileYao’sprotocolprovidesasimpleandefficientsolutionforsecuretwo-partycomputation,inmanyofthescenarioswedescribe,thecomputationoccursamongmultipleparties(e.g.,manyindividuals,eachwiththeirpersonalgenome).Itisverystraightforwardtoreducethegeneralproblemofsecuremultipartycomputationtothatofsecuretwo-partycomputationbyworkinginthe“two-cloud”model.Inthetwo-cloudmodel,weassumethattherearetwonon-colludingservers(e.g.,thesecouldbemanagedbytwoindependentgovernmentagencies)thataggregatetheinputsfromeachpartyinaprivacy-preservingmannerandthenperformthecomputation.EachserveronitsownhasnoknowledgeofthedataasshowninFigure1.7(seeOnlineMethodsandDiscussion).
ProtectionquotientWedefinetheProtectionQuotientasthefractionofprivateinformationthatisnotexposed(toneithertheotherparticipantsnortheentityrunningthecomputation)duringthecomputation.Usingourencryptionscheme,theProtectionQuotientequalsthetotalnumberofpatientvariantswithheldfromtheoutputdividedbythetotalnumberofpatientvariantsinputintothecomputation.Standardunencryptedpatientdiagnosisoperationshaveaprotectionquotientof0%,becauseallvaluesmustbeexposedtoperformthecomputation.Allourapplicationsbelowhaveaprotectionquotientof97.1-99.7%,maximizingprivacywhileretainingfullutility.
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 4: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/4.jpg)
4
ResultsExampleMendelianapplicationsofsecurecomputationToprovethepragmaticutilityofourapproach,wedemonstratethreedifferentsecureoperationsoverrealMendelianpatientswherewesuccessfullyidentifythecausalvariantsineachscenario(Table1):
MAXidentifiesthecausalgeneinsmallpatientcohortswithprotectionquotientabove99%Weuse4smallcohortsofunrelatedindividuals,sufferingfromverydifferentrarediseases:FreemanSheldonSyndrome(FSS),Hadju-CheneySyndrome(HCS),KabukiSyndrome(KaS)andMillerSyndrome(MiS).Eachindividualholdsaprivatelistof211-374rarefunctionalvariantsin210-356genes(total767-2,754variantspercomputation).WeusethesecureMAXfunctiontorevealonlythetopgenemutatedacrosspatientsineachcohort.Inall4cohorts,wefindthatthegenemutatedinmostindividualsistheonethathasbeenproventobethecausalgene:MYH3inFSS6,NOTCH2inHCS19,KMT2DinKaS8andDHODHinMiS7(Table1a).
Securecomputationonlyrevealsthevariantsinthemostmutatedgeneineachcohortwhileprotectingtheremaining764variantsinFSS,1845variantsinHCS,2746variantsinKaSand1055variantsinMiS.Thiscomputationhasaprotectionquotientof99.3-99.7%forall4cohortdiseasedatasets.Thecomputationisperformedoverall20,663genesandcompletesinjust5-10seconds,withoneserverontheEastCoastandtheotherontheWestCoast(Table1a).Thetotalprotocolexecutiontime,bandwidthandcomputetimeallgrowlogarithmicallywiththenumberofcohortindividualsinvolvedinthesecurecomputation(SupplementaryFigure4A).
SETDIFFidentifiesthecausalvariantinatriowithprotectionquotient99.6%Unaffectedmotherandfather,andaffectedmalechildwithfemaleexternalgenitalia,eachholdsalistof164-185(total524)rarefunctionalvariantsfoundintheirexomes.ThesecureSETDIFFoperationrevealstothefamilyandtestprovidersonly2rarevariantsfoundinthechildbutinneitherparent(Table1b).Literaturereviewprovidesadiagnosisbasedononeofthesetwovariants:theACTBgene20.
Securecomputationkeeps522variantsprivatewhilesharingonly2variantswiththetestproviderandallindividualsinvolvedinthecomputation.Thiscomputationhasaprotectionquotientof99.6%.Becausethreepartiesarenowinvolved,thetotalcomputationtimeusingasingleserverthreadoneithercoastis57minutes(Table1b).However,thevariantlistcaneasilybesplitbetweenasmallcomputerarrayoneithercoast,suchthatatypical30-nodeclusterbringscomputationtimedowntounder2minutes.Theprotocolexecutiontime,bandwidthandcomputetimeallgrowlogarithmicallywiththenumberoffamilymembersinvolvedinthesecurecomputation(SupplementaryFigure4B).
INTERSECTIONidentifiespatientsofinterestacross2hospitalswithprotectionquotient97.1%Twoormoregenomecentersmaywanttocomparetheirpatientliststoseeiftogethertheycanfindmultiplepatientswiththesamerarefunctionalmutation,andsimilarphenotypes,whilerevealingnothingelsetoeachother.Forexamplewetook928WashingtonMendelianCenter(WMC)patientsand282BaylorHopkinsCenter(BHC)patients.Foreachhospitalwepreparedalistofover5,000rarefunctionalvariantsseeninoneormoreoftheirpatients.UsingthesecureANDfunction,thetwohospitalsfindashortlistofjust159variantspresentinbothhospitals,pointingatpatientswhowouldbenefitfromphenotypecomparison.Thisshortlistincludes“positivecontrols”suchasknowndiseasevariantNOTCH1:p.E694K,associatedwithpartial/incompletepenetranceofaorticvalvedisease21.IndeedtheWMCandBHCpatientsarephenotypicallycharacterizedwithleftventricularoutflowdefectandthoracicaorticaneurysm,respectively.Thelistalsooffersexcitingnovelgene-diseaseassociations
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 5: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/5.jpg)
5
suchasrarefunctionalvariantHCN3:p.R648H(withfrequency5.47·10-5and0inExACand1000genomesdata,respectively).HCN3isavoltage-gatedcationchannelgene,whosemouseknockoutcausesabnormalventricularactionpotentialwaveform22.Promisingly,inpatientsfromWMCandBHC,thismutationiscorrelatedwithdilatedcardiomyopathyandcoarctationoftheaorta,respectively.
Securecomputationonlyreveals2x159potentialcausativevariantswhileprotectingtheremaining10,749variantswithaprotectionquotientof97.1%.Thiscomputationisperformedoverallrarefunctionalvariantsintheexomewithatotalprotocolexecutiontimeof9.4minutesusingasingleserverthreadoneithercoast(Table1c).Becauseeveryvariantisevaluatedindependently,a30-nodecomputeclusteroneitherendwillreducetotalcomputationtimetobelow20seconds.AswelearntoappreciateMendelianmutationsoutsideoftheexome,thetotaltime,bandwidthandcomputetimescalelinearlywiththesizeofthevariantlistsharedforsecurecomputation(SupplementaryFigure4C).
DiscussionRarediseasesarecumulativelycommon(someestimatethat10%oftheUSpopulationareaffectedwithraredisorders).About7,000rareMendelianconditionshavebeendescribedtodate.Ofthese,approximately4,000havebeendefinitivelydiagnosedassinglegenediseases,mappingtoover4,000genesinthegenome.Theprocedureswedescribeareapplicableforallofthese.Thereareonlyahandfulofmedicalconditionsdiagnosedwithcertaintytotheinteractionsofjust2genes.Farlessisknownfordiseasescausedbymorethan2genes.PersonalGenomicsposesafundamental“serveorprotect”dilemma:shouldoneservetheirgenomeintheserviceofbetterdiagnosisandultimatelydiseaseeradication,orshouldoneprotectoneselfandnextofkinagainstpotentialdiscriminationbyrefusingtosharetheirgenome.ThisdilemmaisparticularlyevidentinthefieldofMendeliandiseases.Itisessentialtodeveloptoolsandmethodstoeffectivelysharegenomeswhilemaintainingtheirprivacyandsecurity.Becausegenomeprivacyisbestservedwhereadefinitivediagnosisexists,wefocusonsinglediseasegenediscoveryanddiagnosis.HerewepresentasecureapproachformultiplepartiestoperformexactcomputationsthatdiagnoseMendeliandiseases,whilekeepingallparticipatinggenomesprivate.
Thescenarioswepresentareallreal.Genomeprivacyisextremelyappealinginallofthem:Completestrangersindiseasecohorts(Table1A)learnnothingabouteachotherexcepttheirshareddisease-causinggenemutations.Forparticipantswheretheassaydoesnotprovideananswer,absolutelynothingisrevealed.Inlargerfamilytrees,moredistantlyrelatedmemberswillappreciategenomeprivacy.Eveninayoungnuclearfamily(e.g.,atrio;Table1B),thetestproviderlearnsalmostnothingexceptthelikelydisease-causingmutationintheoffspring.Moreover,theylearnvirtuallynothingabouttheparentsthemselves.Inthetwohospitalscenario(Table1C),onlyvariantsthatareworthwhilecomparingarerevealedwhilethevastmajorityofvariantsremainprivatetoeachinstitute’sresearchersandpatients.
Inallofthesecases,thequantitiesrevealedandthosethatremainprivateareaprivacyadvocate’sdreamcometrue:Wepredominantlyrevealonlyvariant/scrucialforpatientdiagnosis,familycounselingandanypotentialtreatment.Whatremainsprivateispredominantlyvariantsofunknownsignificance(VUS)thatareoflittlevaluefordiagnosingone’smedicalcondition.However,thesesameVUSvariantsalmostcertainlyuniquelyidentifyapersonasaparticipantinananalysis,andhavethepotentialtorevealnoworinthefutureotherpersonaltraitsthatmaybefurthercausefordiscrimination.
Forthisproof-of-conceptwork,weassumethattheprotocolparticipantsare“honest-but-curious,”(sometimesreferredtoas“semi-honest”)—thatis,weassumethatthepartiesareproperly
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 6: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/6.jpg)
6
incentivizedtohonestlyfollowtheprotocol,butattheendoftheprotocolexecution,theymaytrytolearnsomeadditionalinformation(aboutotherparties’inputs)basedonthemessagestheyreceiveduringtheprotocolexecution.Wesaythataprotocolissecureiftheonlyinformationanypartylearnsbyparticipatingintheprotocolcanbeinferredjustfromthatparties’inputandtheoveralloutputofthecomputation.Inotherwords,noneofthepartiesshouldbeabletolearnsomethingaboutanotherparties’inputotherthanwhatisexplicitlyrevealedbytheoutputofthefunction.
Yao’sprotocolgivesanefficientsolutionforsecuretwo-partycomputationinthepresenceofsemi-honestadversaries23.Wenotethattherearewell-establishedwaystoextendYao’sprotocoltoadditionallyprovidesecurityagainstmaliciouspartieswhodeviatefromtheprotocoldescriptioninordertocompromisetheprivacyofotherparticipantsorcorrupttheresultsofthecomputation24.Inaddition,protectingagainstparticipantsthatsubmitmalicious(ormalformed)inputstotheprotocolcanbedonebyensuringthatifaparticipant’svariantvectordoesnotmeetcertaincriteria,orisnotaccompaniedbyanappropriatecertificate,thenthecomputationabortsanddoesnotproduceanyoutput.Furthermore,inthispaper,weintroduceanoperation-specific“protectionquotient”,anovelmetrictoassessthefractionofinformationsecuredbythecomputation.Theprotectionquotientcanbeusedtofurtherrestricttheoutputreturnedtoallpartiesifthedefinedprivacyrequirementsarenotmet.Forinstance,ifatrioanalysisresultsinmorethanafewexpecteddenovoexomemutations,onlyanerrormessagewillbeproduced.Thisapproachispreferredforexampletodifferentialprivacy25,26whichaddsrandomgenomicvariationasnoiseintoaggregatedsummarystatisticstotryandavoidindividualidentificationinpooledgenomicsdata15.
Thebasicprincipleunderlyingourdesignistoperformexactsecurecomputationonthecomplete(private)genomesofallparticipatingindividuals.Thisisindirectcontrasttothemoretraditionalandlesseffectiveroutesofpublishingobfuscatedfrequenciesaggregatedacrossmultipleindividuals.Thecomputationalresourcesweusetoretaingenomicprivacyarenotnegligible,yetareperfectlywithinthecapabilitiesofoff-the-shelfmoderncomputerstocompletetheoperationinsecondsorminutes,evenwhencommunicatingbetweentheEastandWestcoasts.Andwhilenosecuritymechanismmaybeperfectlyimpenetrable,itiscertainlypreferabletohaveasecuritymechanisminplace(especiallyifitallowsforexactcomputation)wherenonecurrentlyexist.Manyfurtherextensionsandapplicationsofourcomputationalframeworkarepossible,andaresuretoprovideincentivesforthedevelopmentofmoresecureandfastermethods.Awidespreaddeploymentofcomputerlibrariesefficientlyimplementingtheseprincipleswillencourageindividualstosecurelycontributetheirgenomesforthecommongood,andthusgreatlyfueladvancesinbothpersonalgenomicsandprivacyinthe21stcentury.
On-LineMethodsPatientdatasetsWholeexomesequencesofpatientswereobtainedfromdbGaPstudiesphs000204.v1.p16(FreemanSheldonSyndrome),phs000244.v1.p17(MillerSyndrome),phs000295.v1.p18(KabukiSyndrome),andphs000477.v1.p1(Hajdu-CheneySyndrome).Pre-processedvariantcallformat(VCF)filesforpatientsfrom2CentersforMendelianGenomicswereobtainedfromdbGaPstudiesphs000693.v4.p1(UniversityofWashington),andphs000711.v3.p1(BaylorHopkins).OurtriofamilywasobtainedfromStanfordHospital.AllhumansubjectresearchwasperformedunderguidelinesapprovedbytheStanfordInstitutionalReviewBoard.
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 7: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/7.jpg)
7
SequencingreadsweremappedtotheGRCh37/hg19assemblyofthehumangenomeusingBWAMEMv0.7.10-r78927.VariantswerecalledusingGATKv3.4-46-gbc02625followingtheHaplotypeCallerworkflowfromtheGATKBestPractices28.
VariantannotationANNOVARv527wasusedtoannotatevariantswithpredictedeffectonproteincodinggenesusinggeneisoformsfromtheENSEMBLgenesetversion75forthehg19/GRCh37assemblyofthehumangenome29,30.Allcanonicalgeneisoformswereusedwherethetranscriptstartandendaremarkedascompleteandthecodingspanisamultipleofthree.
CryptographictechniquesInasecuremultipartycomputation(MPC)protocol18,31,agroupofusers(oftencalledparties)seektojointlycomputeafunctionovertheirinputswithoutrevealinganyadditionalinformationabouttheirparticularinputs.Thefunctionthatthepartiescomputeisdeterminedbasedonthespecificscenario.Thecomputationconsistsofseveralroundsofinteraction,whereineachround,thepartiesexchangeaseriesofmessages.Attheconclusionoftheprotocol,eachparticipantlearnstheoutputofthecomputationevaluatedoneveryone’sjointinput.Noadditionalinformationbeyondtheexplicitoutputisrevealedtoanyparty(theprocessisabstractedinFigure1).
EveryarithmeticcomputationcanbeexpressedasasequenceofBooleanlogicaloperations(thatis,operationsonbits 0,1 ).Thisispreciselyhowthemoderncomputerworks.Yao’sprotocolallowstwousers,AliceandBob,tocomputearbitraryfunctionsovertheirinputs.Moreprecisely,ifAlicehasaninput#andBobhasaninput*,Yao’sprotocolallowsthemtocompute!(#, *)inawaysuchthatAlicelearnsnothingabout*andBoblearnsnothingabout#otherthantheoutputvalue!(#, *).Ingeneral,expressingafunctionintermsofBooleanoperationsgreatlyincreasesthecomputationalcostofevaluatingthefunction.TomaximizetheefficiencyofYao’sprotocol,itisimportanttochoosefunctionalitieswithsimpleorcompactrepresentationsasBooleancircuits.AnexampleofaBooleancircuitisshowninSupplementaryFigure3.
Inthiswork,wecastdiagnosingMendelianpatientsas(simple)arithmetic/logiccomputationsthatadmitefficientBooleancircuitrepresentations.Wenowdescribehowthesecurecomputationprotocolswork.Todothiswefirstintroducetwostandardtoolsfromcryptography:(1)symmetric(secret-key)encryption32and(2)oblivioustransfer33–35.
EncryptionanddecryptionAsecret-keyencryptionschemeconsistsoftwofunctions:EncryptandDecrypt.Theencryptionfunctiontakesacryptographickeykandamessagemandoutputsaciphertext4.Thedecryptionfunctiontakesthecryptographickey5andaciphertext4andoutputsamessage6.Intuitively,encryptionanddecryptionareinverseoperations:ifweencryptamessageunderakey5,decryptingtheresultingciphertextwiththesamekey5recoverstheoriginalmessage.Moreprecisely,wecansaythatforanykey5andanymessage6,Decrypt 5, Encrypt 5,6 = 6.Inasymmetric(orsecret-key)encryptionscheme,boththeencryptionandthedecryptionfunctionsrequireknowledgeofthesecretcryptographickey.Thekeyisarandomstringdrawnfromsomekey-space.Theprecisenatureofthekey-spacevariesdependingonthedetailsoftheencryptionscheme,andisimmaterialtoourpresentationinthispaper.Anencryptionschemeisconsideredtobesecureiftheciphertextdoesnotrevealanyinformationabouttheunderlyingmessagetoanyuserwhodoesnotpossessthesecretencryptionkey(certainly,auserwhoholdsthesecretkeycandecryptandlearnthemessage).Oneway
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 8: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/8.jpg)
8
toformalizethisistosaythatauserwhodoesnothavetheencryptionkeyisunabletotellanencryptionofamessage68apartfromanencryptionofanothermessage6$.Inotherwords,ciphertextshideallinformationabouttheirunderlyingmessagetoalluserswhodonothavetheencryptionkey.WeillustratethisinSupplementaryFigure2.
Underthisdefinition,messagescanalsobeencryptedmultipletimes.Forinstance,amessage6canbe“doubleencrypted”undertwokeys5$and5&byfirstencrypting6using5$andthenencryptingtheresultingciphertextusingthesecondkey5&.Thisprocedureyieldsanotherciphertext.Decryptionproceedsbyfirstdecryptingwithkey5&,andthendecryptingtheresult(aciphertext)with5$.Inparticular,wecanwrite
Decrypt 5$, Decrypt 5&, Encrypt 5&, Encrypt 5$,6 = 6
Securityofthedoubleencryptionschemefollowsdirectlyfromthesecurityoftheunderlyingencryptionscheme.Inparticular,auserwhodoesnothaveboth5$and5&cannotlearnanyinformationabouttheunderlyingmessagethathasbeendoublyencryptedusing5$and5&.Numeroussymmetric(secret-key)encryptionschemesexistintheliterature32.
OblivioustransferAnoblivioustransfer(OT)protocol33–35isatwo-partyprotocolbetweenasenderandareceiver.AnOTprotocolenablesthereceivertoselectivelyobtainoneoftwopossiblemessagesfromthesenderwithoutrevealingtothesenderwhichmessagethereceiverrequested.Moreprecisely,thesenderholdstwomessages,denoted58and5$andthereceiverholdsaselectionbit: ∈ 0,1 .AttheendoftheOTprotocol,thereceiverobtainsthechosenmessage5<andlearnsnothingabouttheothermessage5$=<.Thesenderdoesnotlearnanythingaboutthereceiver’schoicebit:.Numerousoblivioustransferprotocolshavebeenproposedintheliterature33–35.
OverviewofstepsforsecurecomputationInasecuretwo-partycomputationprotocol,Aliceholdsaninput# ∈ 0,1 >andBobholdsaninput* ∈0,1 >.Wewrite 0,1 >todenoteabinaryinputoflength?(e.g.,forinstance,?couldbeofthebinaryrepresentationofthevariantvectororthegenevectorwedefineinourmaintext).Theirgoalistocomputeafunction!(#, *)ontheirjointinput #, * .Thecomputationisconsidered“secure”ifattheendofthecomputation,theonlyinformationthatAliceandBoblearnisthefunctionvalue!(#, *)andnothingelseabouttheotherparty’sinput.Itisimportanttonoteherethatthefunctionoutput!(#, *)couldrevealsomeinformationabouttheinputs#and*(forexample,inourtrioscenario,whateverdenovovariantwereportinthechild,wecandeducebydefinitiondoesnotexistineitherparent).AswenoteintheDiscussionsection,weworkinthehonest-but-curiousmodelwhereweassumethatAliceandBobfollowtheprotocolspecificationasdirected,butmay,attheendoftheprotocolexecution,trytoinfersomeadditionalinformationabouteachother’sprivateinput.WenowdescribehowYao’sprotocolcanbeusedtosecurelyevaluateanyfunctionovertwoinputsinthehonest-but-curiousmodel.
ToapplyYao’sprotocol,itisfirstnecessarytorepresentthefunction!asaBooleancircuitoninputs#and*.Atthemostbasiclevel,thebuildingblockswehaveareAND(xANDy=Trueonlyifx=y=True,otherwiseitisFalse)andXOR(exclusive-or,xXORy=Trueonlyifx=Trueandy=False,orifx=Falseandy=True,otherwiseitisFalse)gates.Thesebasicgatescanbecombinedtoobtaincircuitsofarbitraryexpressivefunctionalities17.Inourdescriptionbelow,wewilloftentimesrefertotheconcreteexampleofsecurelyevaluatingtheANDfunctiononsingle-bitinputs(above).AvisualizationofthecompleteprotocolisgiveninFigure1.
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 9: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/9.jpg)
9
OverviewofYao’ssecuretwopartycomputationprotocolWenowdescribeYao’sprotocol.Foreaseofpresentation,wepresentasimplified(butlessefficient)descriptionofYao’sprotocolhere.Ourimplementation(basedontheJustGarble36library)followsthehigh-levelblueprintdescribedhere,butincludesseveraloptimizations,notablythefree-XOR37andhalf-gate38optimizations.Step1:Foreachwireinthecircuit,Alicechoosestwokeys.RecallthatinaBooleancircuit,eachwireinthecircuitcantakeontwopossiblevalues(0or1;sometimesalsoreferredtoasFalseandTrue,respectively).Aliceassociatesoneofthekeyswiththewirevalue0andanotherkeywiththewire1.FortheparticularcaseofsecurelyevaluatingtheANDfunction@ = # ∧ *,Alicepicksthreepairsofkeysandassociatesonepairwitheachof#,*,[email protected], 5B$, 5C8, 5C$, 5D8, 5D$.Inthisexample,5B8isthekeyassociatedwiththeinputbit#takingonthevalue0and5D$isthekeyassociatedwiththe
[email protected]:Foreachgateinthecircuit,Aliceconstructsa“garbled”truthtable.Foreachrowinthetruthtable,thealgorithmtakesthekeyassociatedwiththevalueoftheoutputwireanddoubleencryptsitusingthetwokeysassociatedwiththevaluesofthetwoinputwires.FortheparticularcaseofevaluatingasingleANDgate,Alicewouldconstructthefollowingtableofciphertexts
• E$ = EncryptFGH(EncryptFIH 5D8 )
• E& = EncryptFGH(EncryptFIJ 5D8 )
• EK = EncryptFGJ(EncryptFIH 5D8 )
• EL = EncryptFGJ(EncryptFIJ 5D$ )
Fortheoutputwiresofthecircuit,insteadofdoubleencryptinganencryptionkey,Alicedirectlydoubleencryptsthevalueoftheoutputwire(e.g.,0or1).ThisstepisshowninFigure1.2.Step3:Aftergarblingthecircuit(Steps1-2/Figure1.1-1.2),thesecurecomputationbeginswithBobusingtheoblivioustransferprotocol(above)toobtainthekeysfortheinputwiresassociatedwithhisinput.Theoblivioustransferprotocolensuresthefollowing:Bobonlylearnsoneofthetwokeysassociatedwitheachofhisinputwires(thiswillensurethatBobcanonlyevaluatethefunctiononasinglesetofinputs),andAlicedoesnotlearnwhichwireBobrequested(thatis,AlicedoesnotlearnBob’sinput).FortheparticularcaseofevaluatingasingleANDgate,ifBob’sinputis: ∈ {0,1},thenBobwouldplaytheroleofthereceiverinanOTprotocolwithinput:.Alicewouldplaytheroleofthesenderwithmessages5C8,5C$ (thekeysassociatedwithBob’sinputwire).Attheendoftheoblivioustransferprotocol,Bobobtains5C<(thekeyassociatedwithhisinput),andlearnsnothingaboutthekeyassociatedwiththecomplementofhisinput(5C$=<).AlicelearnsnothingaboutBob’sinput:.ThisstepisshowninFigure1.3.Step4:AfterBobreceivesthekeysassociatedwithhisinputviatheoblivioustransferprotocol,AlicesendsBobthegarbledtablesassociatedwitheachgate(afterrandomlypermutingtherowsofeachtable).Additionally,AlicesendsBobthewireencodingsofherinput.FortheparticularcaseofevaluatingasingleANDgate,ifAlice’sinputisO ∈ 0,1 ,Alicewouldsend5BP(thekeyassociatedwith# = O)toBob.ThisstepisalsoshowninFigure1.4.Step5:Withallthisinformation,Bobcancompletethefunctionevaluationandcomputetheoutput.Inparticular,afterSteps3and4,BobshouldhaveasinglekeyforeachoftheinputwiresoftheBooleancircuit.Then,foreachgateinthecircuit,Bobtakestheinputkeyshehasandattemptstodecrypttherowsinthegarbledtableassociatedwiththatgate.Becausetheentriesinthegarbledtablearedouble
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 10: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/10.jpg)
10
encryptedusingthekeysassociatedwiththeinputwirestothegateandBobonlyhasasinglekeyforeachofthewires,BobisonlyabletodecryptasinglerowinthegarbledtableasshowninFigure1.5.Step6:Indoingso,Bobisabletolearnoneofthekeysassociatedwiththegate’soutputwire(moreover,byconstructionofthegarbledtable,theoutputkeyBobobtainsispreciselytheoneassociatedwiththevaluecorrespondingtoevaluationofthegateontheinputbits).Thus,startingwiththeinputwires,Bobisabletoevaluatethecircuitgate-by-gateasseeninFigure1.6.OnceBobreachestheoutputlayerofthecircuit,heisabletodecrypttheciphertextsandobtainthevalueofeachoutputwire.Summary.Tosummarize,inYao’ssecuretwo-partycomputationprotocol,AlicebeginsbyconstructingagarbledtruthtableforeachgateintheBooleancircuit.Shedoessobydoubleencryptingeachrowinthetruthtable(usingthekeysassociatedwiththeinputbits).ShegivesBobthegarbledtruthtablesaswellasthekeysassociatedwithherinput.Usingoblivioustransfer,Bobobtainsthekeysassociatedwithhisinput.Armedwithasinglekeyforeachoftheinputwiresinthecircuit,Bobisabletoevaluatethegarbledcircuitgate-by-gate.Foreachgate,Bobtakeshisinputkeysandusesthemtodecryptoneoftherowsofthegarbledtableassociatedwiththegate.Thisyieldsthekeyassociatedwiththeparticularwire.Finally,attheendofthecomputation,Bobdecryptstheciphertextsassociatedwiththeoutputwirestolearntheoutputofthecircuit.BobthensendstheresultofthecomputationtoAlice.
ExtendingYao’sprotocoltoNpartiesYao’sprotocolallowstwopartiestosecurelyevaluateanarbitraryfunction.However,ingeneral,wedesiretocomputeacrossalargenumberofparties(e.g.,studyparticipants).Whiletherearesecuremultipartycomputationprotocolsthatsupportmorethantwoparties,(e.g.,theSPDZ39,GMW31,orBGW40protocols),akeylimitationoftheseprotocolsisthattheyrequireallparticipatingpartiestobeonlineduringtheprotocolexecution.Moreover,thenumberofroundsofcommunicationintheprotocoloftengrowswiththecomplexityofthecomputation(notethatthisisindirectcontrastwithYao’sprotocolwhichisatwo-roundprotocol,regardlessofhowcomplicatedthecomputationis).Asaresult,therearesubstantialengineeringhurdlestodeployingthesegeneralprotocolsformultipartycomputationacrossalargenumberofparties.Insomecases(e.g.,BGW40andGMW31),thetotalbandwidthalsoscalesquadraticallyinthenumberofparties,furtherlimitingthepracticalityoftheseprotocols.
Amoreefficientsolutionforgeneralmultipartycomputationthatavoidsboththerequirementthatparticipatingpartiesbeonlineduringtheprotocolexecutionaswellasthepotentialcommunicationblowupistoworkina“two-cloud”model.Inthismodel,weassumetherearetwonon-colludingcloudserversthatfacilitatetheprotocolexecution.Atthebeginningoftheprotocolexecution,eachoftheparticipatingparties“split”theirinputsandshareitwiththetwocloudservers.Aslongasthetwocloudsdonotcolludewitheachother,theydonotlearnanythingabouttheinputstothecomputation.Afterthetwocloudservershavereceivedtheinputsfromeachoftheparticipatingparties,theyengageinatwo-partysecurecomputationprotocol(suchasYao’sprotocol)tocomputethefunctionofinterest.Notably,thepartiesthatcontributedthedatadonothavetobeonlineduringthisstepoftheprotocol.Andmoreover,communicationisonlynecessarybetweenthepartiesandthecloudservers;partiesinparticulardonothavetocommunicatewitheachother.Inapracticaldeployment,thesetwocloudserversmightbemanagedbydistinctgovernmentalorganizationswithintheNIHorWHO.Thus,byworkinginthetwo-cloudmodel,itispossibletotransformanycomputationbetween?individualsintoasecuretwo-partycomputationbetweentwonon-colludingparties.
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 11: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/11.jpg)
11
Step7:Wenowdescribehowtosecureevaluateanyfunctionalityinthetwo-cloudmodel.Supposethereare?partiesparticipatingintheprotocolexecutionandlet#$, … , #>denotetheirprivateinputs.Tosecurecomputeafunction!,eachofthe?participantschoosesarandomvalueRS andsendsRS tooneofthetwocloudservers.Theythensendtotheothercloudserverthevalue#S − RS (notethatthesubtractionisperformedmoduloalargeintegerU).OnceeverypartyhassubmittedtheirinputsRS and#S − RS tothetwocloudservers,thefirstcloudserverhasavectorofrandomvaluesO = (R$, … , R>)andthesecondcloudserverhasavectorofrandomdifferences: = (#$ − R$, … , #>– R>).BecausethesubtractionistakingplacemoduloU,thevaluesin:aredistributeduniformlyandmoreimportantly,independentlyofthe#S’s.Thepair(RS, #S − RS)isoftenreferredtoasan“additivesecretsharing”oftheinput#S.Thepropertythatthisadditivesecretsharingschemesatisfiesisthatasinglesharerevealsnoinformationabouttheinput,buttwosharescompletelydefinetheinput.Thismeansthataslongasthetwocloudserversdonotcollude,theylearnnoinformationabouteachparty’sinput(sincetheyeachpossessjustoneshareofthesecret).
Tocompletethesecurecomputation(ofafunction!),thetwocloudserverssimplyapplyYao’sprotocoltothefollowingtwo-partyfunctionality:
W R$, … , R> , #$ − R$, … , #>– R> = ! R$ + #$ − R$, … , R> + #>– R> = ! #$, … , #> .Inotherwords,thetwocloudscomputethefunctionalitythattakesasinputtwovectors(eachcontaining?values)andoutputsthefunction!evaluatedonthecomponentscorrespondingtothesumofthetwoinputvectors.Sincesummingtheinputvectorsinthiscasereconstructseachparty’sinput,thisprocedurecorrespondspreciselytoevaluating!ontheparties’inputs.Moreover,thetwocloudserversdonotlearnanyadditionalinformationaboutanyparticularparty’sinputbecausetheevaluationofWisperformedusingYao’sprotocol(whichisasecuretwo-partycomputationprotocol).ThisprocedureisshowninFigure1.7.
ConstructingourBooleancircuitsAsdescribedabove,arbitraryBooleancircuitscanbeconstructedusingonlyANDandXORgates.Toefficientlyrepresentourset-intersection-basedalgorithmsasBooleancircuits,wefirstconstructsomeintermediatebuildingblocksfromthebasicANDandXORgates.Theintermediatebuildingblockswerequireincludeadditioncircuits,comparisoncircuits,andequalitycircuits.Forthesebuildingblocks,weusethecircuitsbyKolesnikovetal.41(seeSupplementaryFigure5).
SoftwareimplementationInourimplementation,weusetheJustGarblelibrary36forourimplementationofYao’sgarbledcircuits,andweusetheAsharovetal.42implementationoftheoblivioustransferprotocols.Forbetterperformance,wealsoimplementthehalf-gatesoptimization38forYao’sgarbledcircuits.Thisimplementationwillbereleaseduponpublication.Forourbenchmarks,wesetupaclientandserveronAmazonEC2(tosimulatethetwocloudproviders),andmeasurethetotalcomputetime,bandwidth,andoverallprotocolexecutiontime(takingintoaccountthenetworkcommunication).Werunourexperimentsontwomemory-optimizedEC2instances(M4.2xlarge).Eachinstancerunsan8-core2.4GHzIntelXeonE5-2676v3(Haswell)processorandhas32GBofmemory.Whileourprotocolsarenaturallyparallelizable,weuseasinglethreadofexecutioninallofourexperiments,anddonottakeadvantageoftheavailableparallelism.Tosimulatethenon-colludingtwocloudmodel,weusedawide-areanetwork(WAN)settingwherethetwoserversarefarapart.WeplacedoneoftheserversontheWestCoast(specifically,intheNorthernCaliforniaavailabilityzone)andtheotherontheEastCoast(specifically,intheNorthernVirginiaavailabilityzone).
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 12: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/12.jpg)
12
References1. Yang,Y.etal.ClinicalWhole-ExomeSequencingfortheDiagnosisofMendelianDisorders.N.Engl.J.
Med.369,1502–1511(2013).
2. Iglesias,A.etal.Theusefulnessofwhole-exomesequencinginroutineclinicalpractice.Genet.Med.
16,922–931(2014).
3. LeeH,DeignanJL,DorraniN&etal.CLinicalexomesequencingforgeneticidentificationofrare
mendeliandisorders.JAMA312,1880–1887(2014).
4. Rehm,H.L.etal.ACMGclinicallaboratorystandardsfornext-generationsequencing.Genet.Med.
Off.J.Am.Coll.Med.Genet.15,733–747(2013).
5. Lek,M.etal.Analysisofprotein-codinggeneticvariationin60,706humans.Nature536,285–291
(2016).
6. Ng,S.B.etal.Targetedcaptureandmassivelyparallelsequencingof12humanexomes.Nature
461,272–276(2009).
7. Ng,S.B.etal.Exomesequencingidentifiesthecauseofamendeliandisorder.Nat.Genet.42,30–35
(2010).
8. Ng,S.B.etal.ExomesequencingidentifiesMLL2mutationsasacauseofKabukisyndrome.Nat.
Genet.42,790–793(2010).
9. Moreno-Estrada,A.etal.ThegeneticsofMexicorecapitulatesNativeAmericansubstructureand
affectsbiomedicaltraits.Science344,1280–1285(2014).
10. Mailman,M.D.etal.TheNCBIdbGaPdatabaseofgenotypesandphenotypes.Nat.Genet.39,
1181–1186(2007).
11. Siu,L.L.etal.Facilitatingacultureofresponsibleandeffectivesharingofcancergenomedata.Nat.
Med.22,464–471(2016).
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 13: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/13.jpg)
13
12. Shringarpure,S.S.&Bustamante,C.D.PrivacyRisksfromGenomicData-SharingBeacons.Am.J.
Hum.Genet.97,631–646(2015).
13. Regalado,A.NetworksofGenomeDataWillTransformMedicine.MITTechnologyReviewAvailable
at:https://www.technologyreview.com/s/535016/internet-of-dna/.(Accessed:30thSeptember
2016)
14. Wagner,J.,Paulson,J.N.,Wang,X.,Bhattacharjee,B.&CorradaBravo,H.Privacy-preserving
microbiomeanalysisusingsecurecomputation.Bioinformatics32,1873–1879(2016).
15. Simmons,S.,Sahinalp,C.&Berger,B.EnablingPrivacy-PreservingGWASsinHeterogeneousHuman
Populations.CellSyst.3,54–61(2016).
16. Popic,V.&Batzoglou,S.Privacy-PreservingReadMappingUsingLocalitySensitiveHashingand
SecureKmerVoting.bioRxiv046920(2016).doi:10.1101/046920
17. Lindell,Y.&Pinkas,B.SecureMultipartyComputationforPrivacy-PreservingDataMining.J.Priv.
Confidentiality1,59–98(2009).
18. Yao,A.C.-C.ProtocolsforSecureComputations.inAnnualSymposiumonFoundationsofComputer
Science160–164(1982).
19. Simpson,M.A.etal.MutationsinNOTCH2causeHajdu-Cheneysyndrome,adisorderofsevereand
progressiveboneloss.Nat.Genet.43,303–305(2011).
20. Rivière,J.-B.etal.DenovomutationsintheactingenesACTBandACTG1causeBaraitser-Winter
syndrome.Nat.Genet.44,440–444,S1-2(2012).
21. McBride,K.L.etal.NOTCH1mutationsinindividualswithleftventricularoutflowtract
malformationsreduceligand-inducedsignaling.Hum.Mol.Genet.17,2886–2893(2008).
22. Fenske,S.etal.HCN3contributestotheventricularactionpotentialwaveforminthemurineheart.
Circ.Res.109,1015–1023(2011).
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 14: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/14.jpg)
14
23. Lindell,Y.&Pinkas,B.AProofofSecurityofYao’sProtocolforTwo-PartyComputation.JCryptol.22,
161–188(2009).
24. Hazay,C.&Lindell,Y.EfficientSecureTwo-PartyProtocols-TechniquesandConstructions.(Springer,
2010).
25. Dwork,C.DifferentialPrivacy.inICALP1–12(2006).
26. Dinur,I.&Nissim,K.Revealinginformationwhilepreservingprivacy.inPODS202–210(2003).
27. Li,H.&Durbin,R.Fastandaccuratelong-readalignmentwithBurrows-Wheelertransform.
Bioinforma.Oxf.Engl.26,589–595(2010).
28. McKenna,A.etal.TheGenomeAnalysisToolkit:aMapReduceframeworkforanalyzingnext-
generationDNAsequencingdata.GenomeRes.20,1297–1303(2010).
29. Wang,K.,Li,M.&Hakonarson,H.ANNOVAR:functionalannotationofgeneticvariantsfromhigh-
throughputsequencingdata.NucleicAcidsRes.38,e164–e164(2010).
30. Cunningham,F.etal.Ensembl2015.NucleicAcidsRes.43,D662-669(2015).
31. Goldreich,O.,Micali,S.&Wigderson,A.HowtoPlayanyMentalGameorACompletenessTheorem
forProtocolswithHonestMajority.inAnnualACMSymposiumonTheoryofComputing218–229
(1987).
32. Katz,J.&Lindell,Y.IntroductiontoModernCryptography.(ChapmanandHall/CRCPress,2007).
33. Rabin,M.O.HowToExchangeSecretswithObliviousTransfer.IACRCryptol.EPrintArch.2005,187
(2005).
34. Kilian,J.FoundingCryptographyonObliviousTransfer.inProceedingsofthe20thAnnualACM
SymposiumonTheoryofComputing,May2-4,1988,Chicago,Illinois,USA20–31(1988).
35. Naor,M.&Pinkas,B.ObliviousTransferandPolynomialEvaluation.inProceedingsoftheThirty-First
AnnualACMSymposiumonTheoryofComputing,May1-4,1999,Atlanta,Georgia,USA245–254
(1999).
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 15: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/15.jpg)
15
36. Bellare,M.,Hoang,V.T.,Keelveedhi,S.&Rogaway,P.EfficientGarblingfromaFixed-Key
Blockcipher.inIEEESymposiumonSecurityandPrivacy478–492(2013).
37. Kolesnikov,V.&Schneider,T.ImprovedGarbledCircuit:FreeXORGatesandApplications.in
InternationalColloquiumonAutomata,LanguagesandProgramming486–498(2008).
38. Zahur,S.,Rosulek,M.&Evans,D.TwoHalvesMakeaWhole-ReducingDataTransferinGarbled
CircuitsUsingHalfGates.inEUROCRYPT220–250(2015).
39. Damgard,I.,Pastro,V.,Smart,N.P.&Zakarias,S.MultipartyComputationfromSomewhat
HomomorphicEncryption.inCRYPTO643–662(2012).
40. Ben-Or,M.,Goldwasser,S.&Wigderson,A.CompletenessTheoremsforNon-CryptographicFault-
TolerantDistributedComputation(ExtendedAbstract).inSTOC1–10(1988).
41. Kolesnikov,V.,Sadeghi,A.-R.&Schneider,T.ImprovedGarbledCircuitBuildingBlocksand
ApplicationstoAuctionsandComputingMinima.inCryptologyandNetworkSecurity1–20(2009).
42. Asharov,G.,Lindell,Y.,Schneider,T.&Zohner,M.Moreefficientoblivioustransferandextensions
forfastersecurecomputation.inACMCCS535–548(2013).
AuthorContributionsKJ,DW,DBandGBdesignedthestudy,analyzedresultsandwrotethemanuscript.KJandJBprocessedpatientdata.KJandDWwrotesoftwarefortheanalysis.
AcknowledgementsWethankDr.JonBernsteinandmembersoftheBonehandBejeranolabsforvaluablediscussionsandprojectfeedback.WealsothankStanfordpatientsandclinicians,aswellasthepatientsandprofessionalsinvolvedinthedepositionofthedbGaPsetsweuse.
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 16: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/16.jpg)
16
Figures/Tables
Figure1
Alice prepares:
2 types of big boxes + keys(for each value she may be holding)
2 types of small boxes + keys(for each value Bob may be holding)
2 types of notes(with the 2 possible outcomes)
The notes fit in the small boxes,that lock and fit into the big boxes,that also lock.
T F
BT BF
AT AF
Alice holds a secret value a:a = True or a = False.
Bob holds a secret value b:b = True or b= False.
Alice and Bob want to compute togetherf(a,b) = a AND bwithout Bob discovering anything about a,or Alice discovering anything about b.
0 1
Alice puts on the table the two keys (labeled BT and BF) she prepared for the small boxes and leaves the room:
Bob enters the room and picks up the appropriate key “Bb”: BT if his secret value b = True, BF if b = False. After picking up his key, he leaves the room. Alice then gives Bob all four unmarked big locked boxes. She also gives him one unmarked key “Aa”: AT if her secret value a = True, AF if a = False.
BT BF
Bob now holds 4 unmarked big locked boxes,A key from Alice Aa, and his own key Bb.
He tries to get the note from all four boxes,using Aa on the big boxes, and Bb on the small ones.
By design Bob can only reach a single note.
This note holds the correct answer for a AND b,that Alice and Bob set out to compute together.
Alice has learned nothing about Bob’s value b(she has left the room before Bob picked his key).
Bob has learned nothing about Alice’s value a(he received from Alice an unlabeled key).
3
4
5
a AND bAlice
Bob
a
b
Alice
Bob
.
.
.
.
.
.f(A,B)
Instead of providing ananswer, Alice providesthe correct unmarkedkey for the next step.
A
B
G1 G2 Gn
1 0 0 … 1PrivateGenome
0 0 1 … 1RandomNumber
1 0 1 … 0
1 1 1 … 0
1 1 0 … 0
0 0 1 … 0
…
Computer AWest Coast
Computer BEast Coast
R1R2 Rn
…G1-R1 G2-R2
Gn-Rn
…
f(A,B) = f((G1-R1,G2-R2,…,Gn-Rn),(R1,R2,…,Rn)) =f(G1-R1+R1, G1-R2+R2,…,Gn-Rn+Rn) =f(G1,G2,…Gn) = secure computation
with all n genomes
……
…
R1 R2 Rn
76
Aa
TAlice’s a Bob’s b f(a,b) = a AND b
T T T
T F F
F T F
F F F
T
F
F
F
BT AT
=
FBF AT
=
FBT AF
=
FBF AF
=
Alice prepares four big locked boxesmatching all four possible computations:
2
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 17: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/17.jpg)
17
Table1Operation RelevantInformationforeachOperation RunningTimeMeasurements(a) MAX
(overgenes)Scenario:Smalldiseasecohort
#unrelatedprobands(whoavoid
openlysharingtheirdata)
Rarefunctionalvariants
(genes)perproband(median)
#probandswithrarefunctionalvariant/singene(top3,descending
order)
Genename Provencausalgenefordisease
Protectionquotient
(1-#ofvariantssharedoftopgene/total#ofvariants)
Bandwidth(GB)
Compute(sec)
Network(sec)
FreemanSheldonSyndrome
3 258(253)3 MYH3
MYH3 1-3/767=99.6% 0.02 .15 4.912 DBT
1 ACADVLHajdu-Cheney
Syndrome7 278(272)
6 NOTCH2NOTCH2 1-8/1853=
99.6% 0.03 .18 7.293 HLA-DRBI3 MCC
KabukiSyndrome 10 262(257)
8 KMT2DKMT2D 1–8/2754=
99.7% 0.04 .22 9.593 COL6A13 FLNB
MillerSyndrome 4 267(258)
4 DHODHDHODH 1–8/1063=
99.3% 0.03 .18 7.293 DNAH52 ACOX2
(b) SETDIFF(overvariants)
Scenario:familial
PatientID(avoidsharingwprovider)
#rarefunctionalvariants
#probandonlyvariants(revealed)
Genename Provencausalgene
Protectionquotient
Bandwidth(GB)
Compute(min)
Network(min)
Trio
115-f 185 N/R N/R
ACTB 1-2/524=99.6% 18.1 1.7 56.7115-m 164 N/R N/R
115-a1 175 2ACTBUSH2A
(c) INTERSECTION(overvariants)
Scenario:2Hospitals
#suspiciousvariants
(notshared)
Totalintersectingvariants(forpatientphenotypecomparisonfollow-up)
Protectionquotient Bandwidth
(GB)Compute(min)
Network(min)
Washington 5,734 159 1–318/11,067=97.1% 3.1 0.37 9.4
Baylor 5,333
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 18: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/18.jpg)
18
SupplementaryFiguresandTables
SupplementaryFigure1
SupplementaryFigure2
A GenotypeData B
PositionArray
C
SmallCohort
gene
geneposition reference
alternate
GeneArray
… F F T F …
… 0 1 0 … 0 1 0 … 0 1 0 …
m f
a1
… 0 1 0 … 0 1 0 … 0 1 0 …
… 0 0 0 … 1 1 0 … 0 0 1 …
… 0 1 0 … 0 1 0 … 1 0 0 …
… 0 2 0 … 1 3 0 … 1 1 1 …
MAX
SmallFamily
… F F … T …
… F F … F …
… F T … T …
… F T … F …
TwoHospitals
… T F F T …
… F F T T …AND
Hospital1
Hospital2
… F F F T …
VectorRepresentation
m
f
a1
SETDIFF
INTERSECTION
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 19: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/19.jpg)
19
SupplementaryFigure3
SupplementaryFigure4
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 20: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/20.jpg)
20
SupplementaryFigure5
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 21: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/21.jpg)
21
FigureandTableLegendsFigure1.Yao’sprotocolforsecuremultipartycomputation.Steps0-7describetheoverallsecureprotocolforcomputinganyfunctionFbetweentwoormorepartiesF(A,B,..,Y,Z).Wefirstdescribea
securetwo-partycomputationprotocolbetweenAlice(A)andBob(B).Step0:AliceandBobaretryingtocomputeajointfunctionwithoutrevealingtheirinputstotheotherparty.Step1:Alicecreatesakey/boxforeachpossiblevalueforeachinput(0or1).Step2:Alicedoublelocks(doubleencrypts)eachofthefourpossibleoutputsbyplacingtherelevantoutputnoteintwoboxescorrespondingtoeach
combinationofthetwoinputs.Step3:AlicegivesBobtheoptionofchoosingexactlyoneoftwopossiblekeys,labeledBTandBF.Step4:BobpicksupexactlyonekeyBbwherebcorrespondstohis
hiddeninputwhichonlyheknows(theoblivioustransferprotocolensuresthatBobcanonlypickuponekey).AfterBobmakeshisselection,Aliceshufflesthedoubly-lockedboxesandhandsthemtoBobalong
withthekeyAacorrespondingtoherinputa.Steps1-4isrepeatedforeachoftheinputstothefunction.
Step5:Foreachoperatorinthefunctionthatdependsonlyoninputvalues(i.e.,thefirst“layer”ofthecircuit),Bobhasfourdoubly-lockedboxesandtwokeysAaandBbbuthedoesnotknowAlice’sinput
andAlicedoesnotknowBob’sinput.HeusesAaandBbandtriestounlockallfourboxes.Onlyoneof
thefourdoubly-lockedboxeswillsuccessfullyopen,revealingthejointoutputwithoutrevealingAlice’s
orBob’sinputs.Step6:Therevealedoutputyieldsthekeyforthenextoperation(gate)inthecircuit.Steps5and6arerepeatedforeachoperationinthefunction.Attheendofthecomputation,insteadof
keys,Bobobtainsthevaluesthatmakeuptheoutputofthecomputation.Step7:Thissecuretwo-partycomputationprocesscanbeexpandedtoNpartiesbyusingadditivesecretsharingbetweentwonon-
colludingcloudservers.TheN-inputfunctionisthustransformedintoatwo-inputfunction.
Table1.Summaryofresultsfordifferentsecuregenomicmultipartycomputationscenarios,allusingrealpatientdata.
SupplementaryFigure1.Representinggenomicdataasvectorsforsecurecomputation.(A)Eachindividualholdstheirpersonalgenomeprivate.(B)Theyareaskedtofillinapositionarray/vectorwith
TrueandFalsevaluesdependingonwhethertheyhaveararefunctionalvariantatthelistedposition,or
a0/1valueinagenearraydependingonwhethertheyhavenone/somerarefunctionalvariant/sin
eachlistedgene.(C)Theresultingposition/genevectorsareusedtoobtaintheresultsofTable1.
SupplementaryFigure2.Encryptionanddecryptionoverview.Asecret-keyencryptionschemeconsists
ofthreealgorithms:(A)asetupalgorithmwhichoutputsasecretkey(usuallyalongrandomstring);(B)
anencryptionalgorithmthattakesinasecretkey!andamessage"andproducesanencryptionof"
(calledaciphertext);and(C)adecryptionalgorithmthattakesinthesamesecretkey!andaciphertextandproducestheoriginalmessage.WewriteEnc&(")todenoteanencryptionofthemessage"under
thesecretkey!.Thecorrectnessrequirementforanencryptionschemestatesthatdecryptingthe
ciphertextoutputbyEnc&(")usingthesecretkey!shouldyieldtheoriginalmessage(plaintext)".(D)
Thesecurityrequirementforasecret-keyencryptionschemestatesthatanyonewhodoesnotpossessthesecret-key!cannotdistinguishanencryptionofamessage")fromanencryptionofamessage"*,irrespectiveofthechoiceofmessages")and"*.Inotherwords,withoutthesecretkey,theciphertextdoesnotrevealanyinformationabouttheencryptedmessage.
SupplementaryFigure3.Computationusingcircuits.(A)ABooleancircuitconsistsofasequenceoflogicgates(e.g.,ANDgates,ORgates,andNOTgates).Eachlogicgatetakesoneortwobitsasinputand
producesasinglebitofoutput.InaBooleancircuit,theoutputsofonelogicgatecanbeusedasthe
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint
![Page 22: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s](https://reader033.vdocument.in/reader033/viewer/2022060519/604cecea97c4bd532d1a3eda/html5/thumbnails/22.jpg)
22
inputtoanotherlogicgate.Werefertothesevaluesastheintermediatevaluesinthecomputation.In
thecircuitdepictedinthefigure,theinputstothecircuitaredenoted+), +*, +-, +., +/andtheoutputsofthecircuitaredenoted0), 0*.Specifically,thisparticularcircuitimplementsafunctionoverfiveinput
bitsandproducestwooutputbits.(B)EachgateintheBooleancircuitcanbedescribedbyatruthtable
thatspecifiesthemappingbetweeneachconfigurationoftheinputbitstoacorrespondingoutputbit.
InthecaseofanANDgate,therearetwoinputbits,andtheoutputis1ifandonlyifbothinputbitsare1.Otherwise,theoutputis0.
SupplementaryFigure4.Performancescaleupforsecurecomputation.Bandwidth,compute(CPU)
time,andoverallprotocolexecution(wallclock)timeforthesecureMAX,SETDIFFandINTERSECTION
scenariosofTable1,usingasinglethreadontwoservers,onelocatedontheEastCoastandtheother
ontheWestCoast.(A)Whenincreasingthenumberofunrelatedsubjectsinasmallcohortstudy,all
parametersgrowlogarithmically.(B)Whenincreasingthenumberoffamilymembersinanaffected/
nonaffectedscenario,parametersalsogrowlogarithmically.(C)Inthetwohospitalscenario,when
increasingthenumberofgenomicpositionsofpotentialinterest(e.g.,fromtheexometothenon-
codinggenome),allparametersgrowlinearly.Notethatallthreescenarios(A-C)performthebulkof
theircomputationoneachelementoftheinputvectorseparately(SupplementaryFigure1c).All
scenariosarethussimpletoparallelizeformaximumspeed-upusingmultiplethreadsandnodes.
SupplementaryFigure5.Booleanbuildingblocksforcomposingcomplexfunctions.(A)Thebasicbuildingblocksweusetobuildourcircuitsforidentifyingcommonmutationsandshareddenovo
variantsincludeaddition,comparison,equality,andmultiplexercircuits.AnadditioncircuitADD& on!bitinputstakestwo!-bitvaluesandoutputsthe!-bitrepresentationoftheirsum(additionis
performedmodulo2&).TheLT& andEQ& circuitsimplementtheless-thanandequalityoperations,
respectively,on!-bitinputs.TheMUX& circuitimplementsamultiplexercircuitwhichoninputsa
selectionbit: ∈ 0,1 andtwo!-bitvalues+>, +),outputs+?.TheindividualcircuitscanbeefficientlyconstructedusingANDgatesandXORgates,asdescribedbyKolesnikovetal
41.Thesebasiccircuit
buildingblockscanbecomposedtobuildamaxcircuitMAX& ontwoinputs(eachoflength!),whichinturncanbeusedtobuildamaxcircuiton@inputs.(B)Thiscircuitcomputestheargmaxover@additivelysecret-sharedvaluesA), … , AC.Thecircuitoperatesbyfirstcombiningthesharesandthen
takingthemaxovertheresultingvectorofvalues.Theargmaxisrepresentedbyabit-stringoflength@,whereapositionDhasvalue1ifAE isequaltothemaxvalue,and0otherwise.(C)Thiscircuitcomputes
thesetofgenes(representedbyindices)thatarepresentinatestvectorA), … , ACbutnotpresentinapoolF), … , FC.Thecountsofthemutationsappearinginthepoolareadditivelysecretshared.Thecircuit
firstcombinestheshares,andthenidentifiestheindicesDthatappearinthetestvector(AE = 1),butnotpresentinthepool(FE = 0).Thecircuitoutputs:E = 1ifthegeneindexedbyDoccursinthetargetgenomebutnotinthetestpool,and0otherwise.
.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint