nosql distilled: a brief guide to the emerging world of ... · chapter 3: more details on data...

NoSQLDistilledABriefGuidetotheEmergingWorldofPolyglotPersistence

PramodJ.SadalageMartinFowler

UpperSaddleRiver,NJ•Boston•Indianapolis•SanFranciscoNewYork•Toronto•Montreal•London•Munich•Paris•Madrid

Capetown•Sydney•Tokyo•Singapore•MexicoCity

Manyofthedesignationsusedbymanufacturersandsellerstodistinguishtheirproductsareclaimedastrademarks.Wherethosedesignationsappearinthisbook,andthepublisherwasawareofatrademarkclaim,thedesignationshavebeenprintedwithinitialcapitallettersorinallcapitals.

Theauthorsandpublisherhavetakencareinthepreparationofthisbook,butmakenoexpressedorimpliedwarrantyofanykindandassumenoresponsibilityforerrorsoromissions.Noliabilityisassumedforincidentalorconsequentialdamagesinconnectionwithorarisingoutoftheuseoftheinformationorprogramscontainedherein.

Thepublisheroffersexcellentdiscountsonthisbookwhenorderedinquantityforbulkpurchasesorspecialsales,whichmayincludeelectronicversionsand/orcustomcoversandcontentparticulartoyourbusiness,traininggoals,marketingfocus,andbrandinginterests.Formoreinformation,pleasecontact:

U.S.CorporateandGovernmentSales(800)382–[email protected]

ForsalesoutsidetheUnitedStatespleasecontact:

[email protected]

VisitusontheWeb:informit.com/aw

LibraryofCongressCataloging-in-PublicationData:

Sadalage,PramodJ.NoSQLdistilled:abriefguidetotheemergingworldofpolyglotpersistence/PramodJSadalage,MartinFowler.p.cm.Includesbibliographicalreferencesandindex.ISBN978-0-321-82662-6(pbk.:alk.paper)--ISBN0-321-82662-0(pbk.:alk.paper)1.Databases--Technologicalinnovations.2.Informationstorageandretrievalsystems.I.Fowler,Martin,1963-II.Title.QA76.9.D32S2282013005.74--dc23

Copyright©2013PearsonEducation,Inc.

Allrightsreserved.PrintedintheUnitedStatesofAmerica.Thispublicationisprotectedbycopyright,andpermissionmustbeobtainedfromthepublisherpriortoanyprohibitedreproduction,storageinaretrievalsystem,ortransmissioninanyformorbyanymeans,electronic,mechanical,photocopying,recording,orlikewise.Toobtainpermissiontousematerialfromthiswork,pleasesubmitawrittenrequesttoPearsonEducation,Inc.,PermissionsDepartment,OneLakeStreet,UpperSaddleRiver,NewJersey07458,oryoumayfaxyourrequestto(201)236–3290.

ISBN-13:978-0-321-82662-6ISBN-10:0-321-82662-0TextprintedintheUnitedStatesonrecycledpaperatRRDonnelleyinCrawfordsville,Indiana.Firstprinting,August2012

mailto:[email protected]

mailto:[email protected]

http://informit.com/aw

FormyteachersGajananChinchwadkar,DattatrayaMhaskar,andArvindParchure.You

inspiredmethemost,thankyou.—Pramod

ForCindy—Martin

Contents

Preface

PartI:Understand

Chapter1:WhyNoSQL?1.1TheValueofRelationalDatabases

1.1.1GettingatPersistentData1.1.2Concurrency1.1.3Integration1.1.4A(Mostly)StandardModel

1.2ImpedanceMismatch1.3ApplicationandIntegrationDatabases1.4AttackoftheClusters1.5TheEmergenceofNoSQL1.6KeyPoints

Chapter2:AggregateDataModels2.1Aggregates

2.1.1ExampleofRelationsandAggregates2.1.2ConsequencesofAggregateOrientation

2.2Key-ValueandDocumentDataModels2.3Column-FamilyStores2.4SummarizingAggregate-OrientedDatabases2.5FurtherReading2.6KeyPoints

Chapter3:MoreDetailsonDataModels3.1Relationships3.2GraphDatabases3.3SchemalessDatabases3.4MaterializedViews3.5ModelingforDataAccess3.6KeyPoints

Chapter4:DistributionModels4.1SingleServer4.2Sharding4.3Master-SlaveReplication4.4Peer-to-PeerReplication

4.5CombiningShardingandReplication4.6KeyPoints

Chapter5:Consistency5.1UpdateConsistency5.2ReadConsistency5.3RelaxingConsistency

5.3.1TheCAPTheorem5.4RelaxingDurability5.5Quorums5.6FurtherReading5.7KeyPoints

Chapter6:VersionStamps6.1BusinessandSystemTransactions6.2VersionStampsonMultipleNodes6.3KeyPoints

Chapter7:Map-Reduce7.1BasicMap-Reduce7.2PartitioningandCombining7.3ComposingMap-ReduceCalculations

7.3.1ATwoStageMap-ReduceExample7.3.2IncrementalMap-Reduce

7.4FurtherReading7.5KeyPoints

PartII:Implement

Chapter8:Key-ValueDatabases8.1WhatIsaKey-ValueStore8.2Key-ValueStoreFeatures

8.2.1Consistency8.2.2Transactions8.2.3QueryFeatures8.2.4StructureofData8.2.5Scaling

8.3SuitableUseCases8.3.1StoringSessionInformation8.3.2UserProfiles,Preferences8.3.3ShoppingCartData

8.4WhenNottoUse8.4.1RelationshipsamongData8.4.2MultioperationTransactions8.4.3QuerybyData8.4.4OperationsbySets

Chapter9:DocumentDatabases9.1WhatIsaDocumentDatabase?9.2Features

9.2.1Consistency9.2.2Transactions9.2.3Availability9.2.4QueryFeatures9.2.5Scaling

9.3SuitableUseCases9.3.1EventLogging9.3.2ContentManagementSystems,BloggingPlatforms9.3.3WebAnalyticsorReal-TimeAnalytics9.3.4E-CommerceApplications

9.4WhenNottoUse9.4.1ComplexTransactionsSpanningDifferentOperations9.4.2QueriesagainstVaryingAggregateStructure

Chapter10:Column-FamilyStores10.1WhatIsaColumn-FamilyDataStore?10.2Features

10.2.1Consistency10.2.2Transactions10.2.3Availability10.2.4QueryFeatures10.2.5Scaling

10.3SuitableUseCases10.3.1EventLogging10.3.2ContentManagementSystems,BloggingPlatforms10.3.3Counters10.3.4ExpiringUsage

10.4WhenNottoUse

Chapter11:GraphDatabases11.1WhatIsaGraphDatabase?

11.2Features11.2.1Consistency11.2.2Transactions11.2.3Availability11.2.4QueryFeatures11.2.5Scaling

11.3SuitableUseCases11.3.1ConnectedData11.3.2Routing,Dispatch,andLocation-BasedServices11.3.3RecommendationEngines

11.4WhenNottoUse

Chapter12:SchemaMigrations12.1SchemaChanges12.2SchemaChangesinRDBMS

12.2.1MigrationsforGreenFieldProjects12.2.2MigrationsinLegacyProjects

12.3SchemaChangesinaNoSQLDataStore12.3.1IncrementalMigration12.3.2MigrationsinGraphDatabases12.3.3ChangingAggregateStructure

12.4FurtherReading12.5KeyPoints

Chapter13:PolyglotPersistence13.1DisparateDataStorageNeeds13.2PolyglotDataStoreUsage13.3ServiceUsageoverDirectDataStoreUsage13.4ExpandingforBetterFunctionality13.5ChoosingtheRightTechnology13.6EnterpriseConcernswithPolyglotPersistence13.7DeploymentComplexity13.8KeyPoints

Chapter14:BeyondNoSQL14.1FileSystems14.2EventSourcing14.3MemoryImage14.4VersionControl14.5XMLDatabases

14.6ObjectDatabases14.7KeyPoints

Chapter15:ChoosingYourDatabase15.1ProgrammerProductivity15.2Data-AccessPerformance15.3StickingwiththeDefault15.4HedgingYourBets15.5KeyPoints15.6FinalThoughts

Bibliography

Index

Preface

We’vespentsometwentyyearsintheworldofenterprisecomputing.We’veseenmanythingschangeinlanguages,architectures,platforms,andprocesses.Butthroughallthistimeonethinghasstayedconstant—relationaldatabasesstorethedata.Therehavebeenchallengers,someofwhichhavehadsuccessinsomeniches,butonthewholethedatastoragequestionforarchitectshasbeenthequestionofwhichrelationaldatabasetouse.Thereisalotofvalueinthestabilityofthisreign.Anorganization’sdatalastsmuchlongerthatits

programs(atleastthat’swhatpeopletellus—we’veseenplentyofveryoldprogramsoutthere).It’svaluabletohaveastabledatastoragethat’swellunderstoodandaccessiblefrommanyapplicationprogrammingplatforms.Now,however,there’sanewchallengerontheblockundertheconfrontationaltagofNoSQL.It’s

bornoutofaneedtohandlelargerdatavolumeswhichforcedafundamentalshifttobuildinglargehardwareplatformsthroughclustersofcommodityservers.Thisneedhasalsoraisedlong-runningconcernsaboutthedifficultiesofmakingapplicationcodeplaywellwiththerelationaldatamodel.Theterm“NoSQL”isveryill-defined.It’sgenerallyappliedtoanumberofrecentnonrelational

databasessuchasCassandra,Mongo,Neo4J,andRiak.Theyembraceschemalessdata,runonclusters,andhavetheabilitytotradeofftraditionalconsistencyforotherusefulproperties.AdvocatesofNoSQLdatabasesclaimthattheycanbuildsystemsthataremoreperformant,scalemuchbetter,andareeasiertoprogramwith.Isthisthefirstrattleofthedeathknellforrelationaldatabases,oryetanotherpretendertothe

throne?Ouranswertothatis“neither.”Relationaldatabasesareapowerfultoolthatweexpecttobeusingformanymoredecades,butwedoseeaprofoundchangeinthatrelationaldatabaseswon’tbetheonlydatabasesinuse.OurviewisthatweareenteringaworldofPolyglotPersistencewhereenterprises,andevenindividualapplications,usemultipletechnologiesfordatamanagement.Asaresult,architectswillneedtobefamiliarwiththesetechnologiesandbeabletoevaluatewhichonestousefordifferingneeds.Hadwenotthoughtthat,wewouldn’thavespentthetimeandeffortwritingthisbook.ThisbookseekstogiveyouenoughinformationtoanswerthequestionofwhetherNoSQL

databasesareworthseriousconsiderationforyourfutureprojects.Everyprojectisdifferent,andthere’snowaywecanwriteasimpledecisiontreetochoosetherightdatastore.Instead,whatweareattemptinghereistoprovideyouwithenoughbackgroundonhowNoSQLdatabaseswork,sothatyoucanmakethosejudgmentsyourselfwithouthavingtotrawlthewholeweb.We’vedeliberatelymadethisasmallbook,soyoucangetthisoverviewprettyquickly.Itwon’tansweryourquestionsdefinitively,butitshouldnarrowdowntherangeofoptionsyouhavetoconsiderandhelpyouunderstandwhatquestionsyouneedtoask.

WhyAreNoSQLDatabasesInteresting?WeseetwoprimaryreasonswhypeopleconsiderusingaNoSQLdatabase.

•Applicationdevelopmentproductivity.Alotofapplicationdevelopmenteffortisspentonmappingdatabetweenin-memorydatastructuresandarelationaldatabase.ANoSQLdatabasemayprovideadatamodelthatbetterfitstheapplication’sneeds,thussimplifyingthatinteractionandresultinginlesscodetowrite,debug,andevolve.

•Large-scaledata.Organizationsarefindingitvaluabletocapturemoredataandprocessit

morequickly.Theyarefindingitexpensive,ifevenpossible,todosowithrelationaldatabases.Theprimaryreasonisthatarelationaldatabaseisdesignedtorunonasinglemachine,butitisusuallymoreeconomictorunlargedataandcomputingloadsonclustersofmanysmallerandcheapermachines.ManyNoSQLdatabasesaredesignedexplicitlytorunonclusters,sotheymakeabetterfitforbigdatascenarios.

What’sintheBookWe’vebrokenthisbookupintotwoparts.ThefirstpartconcentratesoncoreconceptsthatwethinkyouneedtoknowinordertojudgewhetherNoSQLdatabasesarerelevantforyouandhowtheydiffer.InthesecondpartweconcentratemoreonimplementingsystemswithNoSQLdatabases.Chapter1beginsbyexplainingwhyNoSQLhashadsucharapidrise—theneedtoprocesslarger

datavolumesledtoashift,inlargesystems,fromscalingverticallytoscalinghorizontallyonclusters.ThisexplainsanimportantfeatureofthedatamodelofmanyNoSQLdatabases—theexplicitstorageofarichstructureofcloselyrelateddatathatisaccessedasaunit.Inthisbookwecallthiskindofstructureanaggregate.Chapter2describeshowaggregatesmanifestthemselvesinthreeofthemaindatamodelsin

NoSQLland:key-value(“Key-ValueandDocumentDataModels,”p.20),document(“Key-ValueandDocumentDataModels,”p.20),andcolumnfamily(“Column-FamilyStores,”p.21)databases.Aggregatesprovideanaturalunitofinteractionformanykindsofapplications,whichbothimprovesrunningonaclusterandmakesiteasiertoprogramthedataaccess.Chapter3shiftstothedownsideofaggregates—thedifficultyofhandlingrelationships(“Relationships,”p.25)betweenentitiesindifferentaggregates.Thisleadsusnaturallytographdatabases(“GraphDatabases,”p.26),aNoSQLdatamodelthatdoesn’tfitintotheaggregate-orientedcamp.WealsolookatthecommoncharacteristicofNoSQLdatabasesthatoperatewithoutaschema(“SchemalessDatabases,”p.28)—afeaturethatprovidessomegreaterflexibility,butnotasmuchasyoumightfirstthink.Havingcoveredthedata-modelingaspectofNoSQL,wemoveontodistribution:Chapter4

describeshowdatabasesdistributedatatorunonclusters.Thisbreaksdownintosharding(“Sharding,”p.38)andreplication,thelatterbeingeithermaster-slave(“Master-SlaveReplication,”p.40)orpeer-to-peer(“Peer-to-PeerReplication,”p.42)replication.Withthedistributionmodelsdefined,wecanthenmoveontotheissueofconsistency.NoSQLdatabasesprovideamorevariedrangeofconsistencyoptionsthanrelationaldatabases—whichisaconsequenceofbeingfriendlytoclusters.SoChapter5talksabouthowconsistencychangesforupdates(“UpdateConsistency,”p.47)andreads(“ReadConsistency,”p.49),theroleofquorums(“Quorums,”p.57),andhowevensomedurability(“RelaxingDurability,”p.56)canbetradedoff.Ifyou’veheardanythingaboutNoSQL,you’llalmostcertainlyhaveheardoftheCAPtheorem;the“TheCAPTheorem”sectiononp.53explainswhatitisandhowitfitsin.Whilethesechaptersconcentrateprimarilyontheprinciplesofhowdatagetsdistributedandkept

consistent,thenexttwochapterstalkaboutacoupleofimportanttoolsthatmakethiswork.Chapter6describesversionstamps,whichareforkeepingtrackofchangesanddetectinginconsistencies.Chapter7outlinesmap-reduce,whichisaparticularwayoforganizingparallelcomputationthatfitsinwellwithclustersandthuswithNoSQLsystems.Oncewe’redonewithconcepts,wemovetoimplementationissuesbylookingatsomeexample

databasesunderthefourkeycategories:Chapter8usesRiakasanexampleofkey-valuedatabases,Chapter9takesMongoDBasanexamplefordocumentdatabases,Chapter10choosesCassandratoexplorecolumn-familydatabases,andfinallyChapter11plucksNeo4Jasanexampleofgraph

databases.Wemuststressthatthisisnotacomprehensivestudy—therearetoomanyouttheretowriteabout,letaloneforustotry.Nordoesourchoiceofexamplesimplyanyrecommendations.Ouraimhereistogiveyouafeelforthevarietyofstoresthatexistandforhowdifferentdatabasetechnologiesusetheconceptsweoutlinedearlier.You’llseewhatkindofcodeyouneedtowritetoprogramagainstthesesystemsandgetaglimpseofthemindsetyou’llneedtousethem.AcommonstatementaboutNoSQLdatabasesisthatsincetheyhavenoschema,thereisno

difficultyinchangingthestructureofdataduringthelifeofanapplication.Wedisagree—aschemalessdatabasestillhasanimplicitschemathatneedschangedisciplinewhenyouimplementit,soChapter12explainshowtododatamigrationbothforstrongschemasandforschemalesssystems.AllofthisshouldmakeitclearthatNoSQLisnotasinglething,norisitsomethingthatwill

replacerelationaldatabases.Chapter13looksatthisfutureworldofPolyglotPersistence,wheremultipledata-storageworldscoexist,evenwithinthesameapplication.Chapter14thenexpandsourhorizonsbeyondthisbook,consideringothertechnologiesthatwehaven’tcoveredthatmayalsobeapartofthispolyglot-persistentworld.Withallofthisinformation,youarefinallyatapointwhereyoucanmakeachoiceofwhatdata

storagetechnologiestouse,soourfinalchapter(Chapter15,“ChoosingYourDatabase,”p.147)offerssomeadviceonhowtothinkaboutthesechoices.Inourview,therearetwokeyfactors—findingaproductiveprogrammingmodelwherethedatastoragemodeliswellalignedtoyourapplication,andensuringthatyoucangetthedataaccessperformanceandresilienceyouneed.SincethisisearlydaysintheNoSQLlifestory,we’reafraidthatwedon’thaveawell-definedproceduretofollow,andyou’llneedtotestyouroptionsinthecontextofyourneeds.Thisisabriefoverview—we’vebeenverydeliberateinlimitingthesizeofthisbook.We’ve

selectedtheinformationwethinkisthemostimportant—sothatyoudon’thaveto.Ifyouaregoingtoseriouslyinvestigatethesetechnologies,you’llneedtogofurtherthanwhatwecoverhere,butwehopethisbookprovidesagoodcontexttostartyouonyourway.Wealsoneedtostressthatthisisaveryvolatilefieldofthecomputerindustry.Importantaspectsof

thesestoresarechangingeveryyear—newfeatures,newdatabases.We’vemadeastrongefforttofocusonconcepts,whichwethinkwillbevaluabletounderstandevenastheunderlyingtechnologychanges.We’reprettyconfidentthatmostofwhatwesaywillhavethislongevity,butabsolutelysurethatnotallofitwill.

WhoShouldReadThisBookOurtargetaudienceforthisbookispeoplewhoareconsideringusingsomeformofaNoSQLdatabase.Thismaybeforanewproject,orbecausetheyarehittingbarriersthataresuggestingashiftonanexistingproject.OuraimistogiveyouenoughinformationtoknowwhetherNoSQLtechnologymakessensefor

yourneeds,andifsowhichtooltoexploreinmoredepth.Ourprimaryimaginedaudienceisanarchitectortechnicallead,butwethinkthisbookisalsovaluableforpeopleinvolvedinsoftwaremanagementwhowanttogetanoverviewofthisnewtechnology.Wealsothinkthatifyou’readeveloperwhowantsanoverviewofthistechnology,thisbookwillbeagoodstartingpoint.Wedon’tgointothedetailsofprogramminganddeployingspecificdatabaseshere—weleavethat

forspecialistbooks.We’vealsobeenveryfirmonapagelimit,tokeepthisbookabriefintroduction.Thisisthekindofbookwethinkyoushouldbeabletoreadonaplaneflight:Itwon’tanswerallyourquestionsbutshouldgiveyouagoodsetofquestionstoask.

Ifyou’vealreadydelvedintotheworldofNoSQL,thisbookprobablywon’tcommitanynewitemstoyourstoreofknowledge.However,itmaystillbeusefulbyhelpingyouexplainwhatyou’velearnedtoothers.MakingsenseoftheissuesaroundNoSQLisimportant—particularlyifyou’retryingtopersuadesomeonetoconsiderusingNoSQLinaproject.

WhatAretheDatabasesInthisbook,we’vefollowedacommonapproachofcategorizingNoSQLdatabasesaccordingtotheirdatamodel.Hereisatableofthefourdatamodelsandsomeofthedatabasesthatfiteachmodel.Thisisnotacomprehensivelist—itonlymentionsthemorecommondatabaseswe’vecomeacross.Atthetimeofwriting,youcanfindmorecomprehensivelistsathttp://nosql-database.organdhttp://nosql.mypopescu.com/kb/nosql.Foreachcategory,wemarkwithitalicsthedatabaseweuseasanexampleintherelevantchapter.Ourgoalistopickarepresentativetoolfromeachofthecategoriesofthedatabases.Whilewetalk

aboutspecificexamples,mostofthediscussionshouldapplytotheentirecategory,eventhoughtheseproductsareuniqueandcannotbegeneralizedassuch.Wewillpickonedatabaseforeachofthekey-value,document,columnfamily,andgraphdatabases;whereappropriate,wewillmentionotherproductsthatmayfulfillaspecificfeatureneed.

Thisclassificationbydatamodelisuseful,butcrude.Thelinesbetweenthedifferentdatamodels,suchasthedistinctionbetweenkey-valueanddocumentdatabases(“Key-ValueandDocumentDataModels,”p.20),areoftenblurry.Manydatabasesdon’tfitcleanlyintocategories;forexample,

http://nosql-database.org

http://nosql.mypopescu.com/kb/nosql

OrientDBcallsitselfbothadocumentdatabaseandagraphdatabase.

AcknowledgmentsOurfirstthanksgotoourcolleaguesatThoughtWorks,manyofwhomhavebeenapplyingNoSQLtoourdeliveryprojectsoverthelastcoupleofyears.Theirexperienceshavebeenaprimarysourcebothofourmotivationinwritingthisbookandofpracticalinformationonthevalueofthistechnology.Thepositiveexperiencewe’vehadsofarwithNoSQLdatastoresisthebasisofourviewthatthisisanimportanttechnologyandasignificantshiftindatastorage.We’dalsoliketothankvariousgroupswhohavegivenpublictalks,publishedarticles,andblogs

ontheiruseofNoSQL.Muchprogressinsoftwaredevelopmentgetshiddenwhenpeopledon’tsharewiththeirpeerswhatthey’velearned.ParticularthanksheregotoGoogleandAmazonwhosepapersonBigtableandDynamowereveryinfluentialingettingtheNoSQLmovementgoing.Wealsothankcompaniesthathavesponsoredandcontributedtotheopen-sourcedevelopmentofNoSQLdatabases.AninterestingdifferencewithpreviousshiftsindatastorageisthedegreetowhichtheNoSQLmovementisrootedinopen-sourcework.ParticularthanksgotoThoughtWorksforgivingusthetimetoworkonthisbook.Wejoined

ThoughtWorksataroundthesametimeandhavebeenhereforoveradecade.ThoughtWorkscontinuestobeaveryhospitablehomeforus,asourceofknowledgeandpractice,andawelcomeenvironmentofopenlysharingwhatwelearn—sodifferentfromthetraditionalsystemsdeliveryorganizations.BethanyAnders-Beck,IliasBartolini,TimBerglund,DuncanCraig,PaulDuvall,OrenEini,Perryn

Fowler,MichaelHunger,EricKascic,JoshuaKerievsky,AnandKrishnaswamy,BobbyNorton,AdeOshineye,ThiyaguPalanisamy,PrasannaPendse,DanPritchett,DavidRice,MikeRoberts,MarkoRodriquez,AndrewSlocum,TobyTripp,SteveVinoski,DeanWampler,JimWebber,andWeeWitthawaskulreviewedearlydraftsofthisbookandhelpedusimproveitwiththeiradvice.Additionally,PramodwouldliketothankSchaumburgLibraryforprovidinggreatserviceand

quietspaceforwriting;ArhanaandArula,mybeautifuldaughters,fortheirunderstandingthatdaddywouldgotothelibraryandnottakethemalong;Rupali,mybelovedwife,forherimmensesupportandhelpinkeepingmefocused.

PartI:Understand

Chapter1.WhyNoSQL?

Foralmostaslongaswe’vebeeninthesoftwareprofession,relationaldatabaseshavebeenthedefaultchoiceforseriousdatastorage,especiallyintheworldofenterpriseapplications.Ifyou’reanarchitectstartinganewproject,youronlychoiceislikelytobewhichrelationaldatabasetouse.(Andoftennoteventhat,ifyourcompanyhasadominantvendor.)Therehavebeentimeswhenadatabasetechnologythreatenedtotakeapieceoftheaction,suchasobjectdatabasesinthe1990’s,butthesealternativesnevergotanywhere.Aftersuchalongperiodofdominance,thecurrentexcitementaboutNoSQLdatabasescomesasa

surprise.Inthischapterwe’llexplorewhyrelationaldatabasesbecamesodominant,andwhywethinkthecurrentriseofNoSQLdatabasesisn’taflashinthepan.

1.1.TheValueofRelationalDatabasesRelationaldatabaseshavebecomesuchanembeddedpartofourcomputingculturethatit’seasytotakethemforgranted.It’sthereforeusefultorevisitthebenefitstheyprovide.

1.1.1.GettingatPersistentDataProbablythemostobviousvalueofadatabaseiskeepinglargeamountsofpersistentdata.Mostcomputerarchitectureshavethenotionoftwoareasofmemory:afastvolatile“mainmemory”andalargerbutslower“backingstore.”Mainmemoryisbothlimitedinspaceandlosesalldatawhenyoulosepowerorsomethingbadhappenstotheoperatingsystem.Therefore,tokeepdataaround,wewriteittoabackingstore,commonlyseenadisk(althoughthesedaysthatdiskcanbepersistentmemory).Thebackingstorecanbeorganizedinallsortsofways.Formanyproductivityapplications(such

aswordprocessors),it’safileinthefilesystemoftheoperatingsystem.Formostenterpriseapplications,however,thebackingstoreisadatabase.Thedatabaseallowsmoreflexibilitythanafilesysteminstoringlargeamountsofdatainawaythatallowsanapplicationprogramtogetatsmallbitsofthatinformationquicklyandeasily.

1.1.2.ConcurrencyEnterpriseapplicationstendtohavemanypeoplelookingatthesamebodyofdataatonce,possiblymodifyingthatdata.Mostofthetimetheyareworkingondifferentareasofthatdata,butoccasionallytheyoperateonthesamebitofdata.Asaresult,wehavetoworryaboutcoordinatingtheseinteractionstoavoidsuchthingsasdoublebookingofhotelrooms.Concurrencyisnotoriouslydifficulttogetright,withallsortsoferrorsthatcantrapeventhemost

carefulprogrammers.Sinceenterpriseapplicationscanhavelotsofusersandothersystemsallworkingconcurrently,there’salotofroomforbadthingstohappen.Relationaldatabaseshelphandlethisbycontrollingallaccesstotheirdatathroughtransactions.Whilethisisn’tacure-all(youstillhavetohandleatransactionalerrorwhenyoutrytobookaroomthat’sjustgone),thetransactionalmechanismhasworkedwelltocontainthecomplexityofconcurrency.Transactionsalsoplayaroleinerrorhandling.Withtransactions,youcanmakeachange,andifan

erroroccursduringtheprocessingofthechangeyoucanrollbackthetransactiontocleanthingsup.

1.1.3.IntegrationEnterpriseapplicationsliveinarichecosystemthatrequiresmultipleapplications,writtenby

differentteams,tocollaborateinordertogetthingsdone.Thiskindofinter-applicationcollaborationisawkwardbecauseitmeanspushingthehumanorganizationalboundaries.Applicationsoftenneedtousethesamedataandupdatesmadethroughoneapplicationhavetobevisibletoothers.Acommonwaytodothisisshareddatabaseintegration[HohpeandWoolf]wheremultiple

applicationsstoretheirdatainasingledatabase.Usingasingledatabaseallowsalltheapplicationstouseeachothers’dataeasily,whilethedatabase’sconcurrencycontrolhandlesmultipleapplicationsinthesamewayasithandlesmultipleusersinasingleapplication.

1.1.4.A(Mostly)StandardModelRelationaldatabaseshavesucceededbecausetheyprovidethecorebenefitsweoutlinedearlierina(mostly)standardway.Asaresult,developersanddatabaseprofessionalscanlearnthebasicrelationalmodelandapplyitinmanyprojects.Althoughtherearedifferencesbetweendifferentrelationaldatabases,thecoremechanismsremainthesame:Differentvendors’SQLdialectsaresimilar,transactionsoperateinmostlythesameway.

1.2.ImpedanceMismatchRelationaldatabasesprovidemanyadvantages,buttheyarebynomeansperfect.Evenfromtheirearlydays,therehavebeenlotsoffrustrationswiththem.Forapplicationdevelopers,thebiggestfrustrationhasbeenwhat’scommonlycalledtheimpedance

mismatch:thedifferencebetweentherelationalmodelandthein-memorydatastructures.Therelationaldatamodelorganizesdataintoastructureoftablesandrows,ormoreproperly,relationsandtuples.Intherelationalmodel,atupleisasetofname-valuepairsandarelationisasetoftuples.(Therelationaldefinitionofatupleisslightlydifferentfromthatinmathematicsandmanyprogramminglanguageswithatupledatatype,whereatupleisasequenceofvalues.)AlloperationsinSQLconsumeandreturnrelations,whichleadstothemathematicallyelegantrelationalalgebra.Thisfoundationonrelationsprovidesacertaineleganceandsimplicity,butitalsointroduces

limitations.Inparticular,thevaluesinarelationaltuplehavetobesimple—theycannotcontainanystructure,suchasanestedrecordoralist.Thislimitationisn’ttrueforin-memorydatastructures,whichcantakeonmuchricherstructuresthanrelations.Asaresult,ifyouwanttousearicherin-memorydatastructure,youhavetotranslateittoarelationalrepresentationtostoreitondisk.Hencetheimpedancemismatch—twodifferentrepresentationsthatrequiretranslation(seeFigure1.1).

Figure1.1.Anorder,whichlookslikeasingleaggregatestructureintheUI,issplitintomanyrowsfrommanytablesinarelationaldatabase

Theimpedancemismatchisamajorsourceoffrustrationtoapplicationdevelopers,andinthe1990smanypeoplebelievedthatitwouldleadtorelationaldatabasesbeingreplacedwithdatabasesthatreplicatethein-memorydatastructurestodisk.Thatdecadewasmarkedwiththegrowthofobject-orientedprogramminglanguages,andwiththemcameobject-orienteddatabases—bothlookingtobethedominantenvironmentforsoftwaredevelopmentinthenewmillennium.However,whileobject-orientedlanguagessucceededinbecomingthemajorforcein

programming,object-orienteddatabasesfadedintoobscurity.Relationaldatabasessawoffthechallengebystressingtheirroleasanintegrationmechanism,supportedbyamostlystandardlanguageofdatamanipulation(SQL)andagrowingprofessionaldividebetweenapplicationdevelopersanddatabaseadministrators.Impedancemismatchhasbeenmademucheasiertodealwithbythewideavailabilityofobject-

relationalmappingframeworks,suchasHibernateandiBATISthatimplementwell-knownmappingpatterns[FowlerPoEAA],butthemappingproblemisstillanissue.Object-relationalmappingframeworksremovealotofgruntwork,butcanbecomeaproblemoftheirownwhenpeopletrytoohardtoignorethedatabaseandqueryperformancesuffers.Relationaldatabasescontinuedtodominatetheenterprisecomputingworldinthe2000s,butduring

thatdecadecracksbegantoopenintheirdominance.

1.3.ApplicationandIntegrationDatabasesTheexactreasonswhyrelationaldatabasestriumphedoverOOdatabasesarestillthesubjectofanoccasionalpubdebatefordevelopersofacertainage.Butinourview,theprimaryfactorwastheroleofSQLasanintegrationmechanismbetweenapplications.Inthisscenario,thedatabaseactsasanintegrationdatabase—withmultipleapplications,usuallydevelopedbyseparateteams,storing

theirdatainacommondatabase.Thisimprovescommunicationbecausealltheapplicationsareoperatingonaconsistentsetofpersistentdata.Therearedownsidestoshareddatabaseintegration.Astructurethat’sdesignedtointegratemany

applicationsendsupbeingmorecomplex—indeed,oftendramaticallymorecomplex—thananysingleapplicationneeds.Furthermore,shouldanapplicationwanttomakechangestoitsdatastorage,itneedstocoordinatewithalltheotherapplicationsusingthedatabase.Differentapplicationshavedifferentstructuralandperformanceneeds,soanindexrequiredbyoneapplicationmaycauseaproblematichitoninsertsforanother.Thefactthateachapplicationisusuallyaseparateteamalsomeansthatthedatabaseusuallycannottrustapplicationstoupdatethedatainawaythatpreservesdatabaseintegrityandthusneedstotakeresponsibilityforthatwithinthedatabaseitself.Adifferentapproachistotreatyourdatabaseasanapplicationdatabase—whichisonlydirectly

accessedbyasingleapplicationcodebasethat’slookedafterbyasingleteam.Withanapplicationdatabase,onlytheteamusingtheapplicationneedstoknowaboutthedatabasestructure,whichmakesitmucheasiertomaintainandevolvetheschema.Sincetheapplicationteamcontrolsboththedatabaseandtheapplicationcode,theresponsibilityfordatabaseintegritycanbeputintheapplicationcode.Interoperabilityconcernscannowshifttotheinterfacesoftheapplication,allowingforbetter

interactionprotocolsandprovidingsupportforchangingthem.Duringthe2000swesawadistinctshifttowebservices[Daigneau],whereapplicationswouldcommunicateoverHTTP.Webservicesenabledanewformofawidelyusedcommunicationmechanism—achallengertousingtheSQLwithshareddatabases.(Muchofthisworkwasdoneunderthebannerof“Service-OrientedArchitecture”—atermmostnotableforitslackofaconsistentmeaning.)Aninterestingaspectofthisshifttowebservicesasanintegrationmechanismwasthatitresultedin

moreflexibilityforthestructureofthedatathatwasbeingexchanged.IfyoucommunicatewithSQL,thedatamustbestructuredasrelations.However,withaservice,youareabletousericherdatastructureswithnestedrecordsandlists.TheseareusuallyrepresentedasdocumentsinXMLor,morerecently,JSON.Ingeneral,withremotecommunicationyouwanttoreducethenumberofroundtripsinvolvedintheinteraction,soit’susefultobeabletoputarichstructureofinformationintoasinglerequestorresponse.Ifyouaregoingtouseservicesforintegration,mostofthetimewebservices—usingtextover

HTTP—isthewaytogo.However,ifyouaredealingwithhighlyperformance-sensitiveinteractions,youmayneedabinaryprotocol.Onlydothisifyouaresureyouhavetheneed,astextprotocolsareeasiertoworkwith—considertheexampleoftheInternet.Onceyouhavemadethedecisiontouseanapplicationdatabase,yougetmorefreedomof

choosingadatabase.Sincethereisadecouplingbetweenyourinternaldatabaseandtheserviceswithwhichyoutalktotheoutsideworld,theoutsideworlddoesn’thavetocarehowyoustoreyourdata,allowingyoutoconsidernonrelationaloptions.Furthermore,therearemanyfeaturesofrelationaldatabases,suchassecurity,thatarelessusefultoanapplicationdatabasebecausetheycanbedonebytheenclosingapplicationinstead.Despitethisfreedom,however,itwasn’tapparentthatapplicationdatabasesledtoabigrushto

alternativedatastores.Mostteamsthatembracedtheapplicationdatabaseapproachstuckwithrelationaldatabases.Afterall,usinganapplicationdatabaseyieldsmanyadvantagesevenignoringthedatabaseflexibility(whichiswhywegenerallyrecommendit).Relationaldatabasesarefamiliarandusuallyworkverywellor,atleast,wellenough.Perhaps,giventime,wemighthaveseentheshifttoapplicationdatabasestoopenarealcrackintherelationalhegemony—butsuchcrackscamefrom

anothersource.

1.4.AttackoftheClustersAtthebeginningofthenewmillenniumthetechnologyworldwashitbythebustingofthe1990sdot-combubble.WhilethissawmanypeoplequestioningtheeconomicfutureoftheInternet,the2000sdidseeseverallargewebpropertiesdramaticallyincreaseinscale.Thisincreaseinscalewashappeningalongmanydimensions.Websitesstartedtrackingactivityand

structureinaverydetailedway.Largesetsofdataappeared:links,socialnetworks,activityinlogs,mappingdata.Withthisgrowthindatacameagrowthinusers—asthebiggestwebsitesgrewtobevastestatesregularlyservinghugenumbersofvisitors.Copingwiththeincreaseindataandtrafficrequiredmorecomputingresources.Tohandlethis

kindofincrease,youhavetwochoices:uporout.Scalingupimpliesbiggermachines,moreprocessors,diskstorage,andmemory.Butbiggermachinesgetmoreandmoreexpensive,nottomentionthattherearereallimitsasyoursizeincreases.Thealternativeistouselotsofsmallmachinesinacluster.Aclusterofsmallmachinescanusecommodityhardwareandendsupbeingcheaperatthesekindsofscales.Itcanalsobemoreresilient—whileindividualmachinefailuresarecommon,theoverallclustercanbebuilttokeepgoingdespitesuchfailures,providinghighreliability.Aslargepropertiesmovedtowardsclusters,thatrevealedanewproblem—relationaldatabasesare

notdesignedtoberunonclusters.Clusteredrelationaldatabases,suchastheOracleRACorMicrosoftSQLServer,workontheconceptofashareddisksubsystem.Theyuseacluster-awarefilesystemthatwritestoahighlyavailabledisksubsystem—butthismeanstheclusterstillhasthedisksubsystemasasinglepointoffailure.Relationaldatabasescouldalsoberunasseparateserversfordifferentsetsofdata,effectivelysharding(“Sharding,”p.38)thedatabase.Whilethisseparatestheload,alltheshardinghastobecontrolledbytheapplicationwhichhastokeeptrackofwhichdatabaseservertotalktoforeachbitofdata.Also,weloseanyquerying,referentialintegrity,transactions,orconsistencycontrolsthatcrossshards.Aphraseweoftenhearinthiscontextfrompeoplewho’vedonethisis“unnaturalacts.”Thesetechnicalissuesareexacerbatedbylicensingcosts.Commercialrelationaldatabasesare

usuallypricedonasingle-serverassumption,sorunningonaclusterraisedpricesandledtofrustratingnegotiationswithpurchasingdepartments.Thismismatchbetweenrelationaldatabasesandclustersledsomeorganizationtoconsideran

alternativeroutetodatastorage.Twocompaniesinparticular—GoogleandAmazon—havebeenveryinfluential.Bothwereontheforefrontofrunninglargeclustersofthiskind;furthermore,theywerecapturinghugeamountsofdata.Thesethingsgavethemthemotive.Bothweresuccessfulandgrowingcompanieswithstrongtechnicalcomponents,whichgavethemthemeansandopportunity.Itwasnowondertheyhadmurderinmindfortheirrelationaldatabases.Asthe2000sdrewon,bothcompaniesproducedbriefbuthighlyinfluentialpapersabouttheirefforts:BigTablefromGoogleandDynamofromAmazon.It’softensaidthatAmazonandGoogleoperateatscalesfarremovedfrommostorganizations,so

thesolutionstheyneededmaynotberelevanttoanaverageorganization.Whileit’struethatmostsoftwareprojectsdon’tneedthatlevelofscale,it’salsotruethatmoreandmoreorganizationsarebeginningtoexplorewhattheycandobycapturingandprocessingmoredata—andtorunintothesameproblems.So,asmoreinformationleakedoutaboutwhatGoogleandAmazonhaddone,peoplebegantoexploremakingdatabasesalongsimilarlines—explicitlydesignedtoliveinaworld

ofclusters.Whiletheearliermenacestorelationaldominanceturnedouttobephantoms,thethreatfromclusterswasserious.

1.5.TheEmergenceofNoSQLIt’sawonderfulironythattheterm“NoSQL”firstmadeitsappearanceinthelate90sasthenameofanopen-sourcerelationaldatabase[StrozziNoSQL].LedbyCarloStrozzi,thisdatabasestoresitstablesasASCIIfiles,eachtuplerepresentedbyalinewithfieldsseparatedbytabs.Thenamecomesfromthefactthatthedatabasedoesn’tuseSQLasaquerylanguage.Instead,thedatabaseismanipulatedthroughshellscriptsthatcanbecombinedintotheusualUNIXpipelines.Otherthantheterminologicalcoincidence,Strozzi’sNoSQLhadnoinfluenceonthedatabaseswedescribeinthisbook.Theusageof“NoSQL”thatwerecognizetodaytracesbacktoameetuponJune11,2009inSan

FranciscoorganizedbyJohanOskarsson,asoftwaredeveloperbasedinLondon.TheexampleofBigTableandDynamohadinspiredabunchofprojectsexperimentingwithalternativedatastorage,anddiscussionsofthesehadbecomeafeatureofthebettersoftwareconferencesaroundthattime.JohanwasinterestedinfindingoutmoreaboutsomeofthesenewdatabaseswhilehewasinSanFranciscoforaHadoopsummit.Sincehehadlittletimethere,hefeltthatitwouldn’tbefeasibletovisitthemall,sohedecidedtohostameetupwheretheycouldallcometogetherandpresenttheirworktowhoeverwasinterested.Johanwantedanameforthemeetup—somethingthatwouldmakeagoodTwitterhashtag:short,

memorable,andwithouttoomanyGooglehitssothatasearchonthenamewouldquicklyfindthemeetup.Heaskedforsuggestionsonthe#cassandraIRCchannelandgotafew,selectingthesuggestionof“NoSQL”fromEricEvans(adeveloperatRackspace,noconnectiontotheDDDEricEvans).Whileithadthedisadvantageofbeingnegativeandnotreallydescribingthesesystems,itdidfitthehashtagcriteria.Atthetimetheywerethinkingofonlynamingasinglemeetingandwerenotexpectingittocatchontonamethisentiretechnologytrend[Oskarsson].Theterm“NoSQL”caughtonlikewildfire,butit’sneverbeenatermthat’shadmuchinthewayof

astrongdefinition.Theoriginalcall[NoSQLMeetup]forthemeetupaskedfor“open-source,distributed,nonrelationaldatabases.”Thetalksthere[NoSQLDebrief]werefromVoldemort,Cassandra,Dynomite,HBase,Hypertable,CouchDB,andMongoDB—butthetermhasneverbeenconfinedtothatoriginalseptet.There’snogenerallyaccepteddefinition,noranauthoritytoprovideone,soallwecandoisdiscusssomecommoncharacteristicsofthedatabasesthattendtobecalled“NoSQL.”Tobeginwith,thereistheobviouspointthatNoSQLdatabasesdon’tuseSQL.Someofthemdo

havequerylanguages,anditmakessenseforthemtobesimilartoSQLinordertomakethemeasiertolearn.Cassandra’sCQLislikethis—“exactlylikeSQL(exceptwhereit’snot)”[CQL].ButsofarnonehaveimplementedanythingthatwouldfiteventheratherflexiblenotionofstandardSQL.ItwillbeinterestingtoseewhathappensifanestablishedNoSQLdatabasedecidestoimplementareasonablystandardSQL;theonlypredictableoutcomeforsuchaneventualityisplentyofargument.Anotherimportantcharacteristicofthesedatabasesisthattheyaregenerallyopen-sourceprojects.

AlthoughthetermNoSQLisfrequentlyappliedtoclosed-sourcesystems,there’sanotionthatNoSQLisanopen-sourcephenomenon.MostNoSQLdatabasesaredrivenbytheneedtorunonclusters,andthisiscertainlytrueofthose

thatweretalkedaboutduringtheinitialmeetup.Thishasaneffectontheirdatamodelaswellastheirapproachtoconsistency.RelationaldatabasesuseACIDtransactions(p.19)tohandleconsistency

acrossthewholedatabase.Thisinherentlyclasheswithaclusterenvironment,soNoSQLdatabasesofferarangeofoptionsforconsistencyanddistribution.However,notallNoSQLdatabasesarestronglyorientedtowardsrunningonclusters.Graph

databasesareonestyleofNoSQLdatabasesthatusesadistributionmodelsimilartorelationaldatabasesbutoffersadifferentdatamodelthatmakesitbetterathandlingdatawithcomplexrelationships.NoSQLdatabasesaregenerallybasedontheneedsoftheearly21stcenturywebestates,sousually

onlysystemsdevelopedduringthattimeframearecalledNoSQL—thusrulingouthoardsofdatabasescreatedbeforethenewmillennium,letaloneBC(BeforeCodd).NoSQLdatabasesoperatewithoutaschema,allowingyoutofreelyaddfieldstodatabaserecords

withouthavingtodefineanychangesinstructurefirst.ThisisparticularlyusefulwhendealingwithnonuniformdataandcustomfieldswhichforcedrelationaldatabasestousenameslikecustomField6orcustomfieldtablesthatareawkwardtoprocessandunderstand.AlloftheabovearecommoncharacteristicsofthingsthatweseedescribedasNoSQLdatabases.

Noneofthesearedefinitional,andindeedit’slikelythattherewillneverbeacoherentdefinitionof“NoSQL”(sigh).However,thiscrudesetofcharacteristicshasbeenourguideinwritingthisbook.OurchiefenthusiasmwiththissubjectisthattheriseofNoSQLhasopeneduptherangeofoptionsfordatastorage.Consequently,thisopeningupshouldn’tbeconfinedtowhat’susuallyclassedasaNoSQLstore.Wehopethatotherdatastorageoptionswillbecomemoreacceptable,includingmanythatpredatetheNoSQLmovement.Thereisalimit,however,towhatwecanusefullydiscussinthisbook,sowe’vedecidedtoconcentrateonthisnoDefinition.Whenyoufirsthear“NoSQL,”animmediatequestioniswhatdoesitstandfor—a“no”toSQL?

MostpeoplewhotalkaboutNoSQLsaythatitreallymeans“NotOnlySQL,”butthisinterpretationhasacoupleofproblems.Mostpeoplewrite“NoSQL”whereas“NotOnlySQL”wouldbewritten“NOSQL.”Also,therewouldn’tbemuchpointincallingsomethingaNoSQLdatabaseunderthe“notonly”meaning—becausethen,OracleorPostgreswouldfitthatdefinition,wewouldprovethatblackequalswhiteandwouldallgetrunoveroncrosswalks.Toresolvethis,wesuggestthatyoudon’tworryaboutwhatthetermstandsfor,butratherabout

whatitmeans(whichisrecommendedwithmostacronyms).Thus,when“NoSQL”isappliedtoadatabase,itreferstoanill-definedsetofmostlyopen-sourcedatabases,mostlydevelopedintheearly21stcentury,andmostlynotusingSQL.The“not-only”interpretationdoeshaveitsvalue,asitdescribestheecosystemthatmanypeople

thinkisthefutureofdatabases.Thisisinfactwhatweconsidertobethemostimportantcontributionofthiswayofthinking—it’sbettertothinkofNoSQLasamovementratherthanatechnology.Wedon’tthinkthatrelationaldatabasesaregoingaway—theyarestillgoingtobethemostcommonformofdatabaseinuse.Eventhoughwe’vewrittenthisbook,westillrecommendrelationaldatabases.Theirfamiliarity,stability,featureset,andavailablesupportarecompellingargumentsformostprojects.Thechangeisthatnowweseerelationaldatabasesasoneoptionfordatastorage.Thispointof

viewisoftenreferredtoaspolyglotpersistence—usingdifferentdatastoresindifferentcircumstances.Insteadofjustpickingarelationaldatabasebecauseeveryonedoes,weneedtounderstandthenatureofthedatawe’restoringandhowwewanttomanipulateit.Theresultisthatmostorganizationswillhaveamixofdatastoragetechnologiesfordifferentcircumstances.Inordertomakethispolyglotworldwork,ourviewisthatorganizationsalsoneedtoshiftfrom

integrationdatabasestoapplicationdatabases.Indeed,weassumeinthisbookthatyou’llbeusinga

NoSQLdatabaseasanapplicationdatabase;wedon’tgenerallyconsiderNoSQLdatabasesagoodchoiceforintegrationdatabases.Wedon’tseethisasadisadvantageaswethinkthatevenifyoudon’tuseNoSQL,shiftingtoencapsulatingdatainservicesisagooddirectiontotake.InouraccountofthehistoryofNoSQLdevelopment,we’veconcentratedonbigdatarunningon

clusters.Whilewethinkthisisthekeythingthatdrovetheopeningupofthedatabaseworld,itisn’ttheonlyreasonweseeprojectteamsconsideringNoSQLdatabases.Anequallyimportantreasonistheoldfrustrationwiththeimpedancemismatchproblem.Thebigdataconcernshavecreatedanopportunityforpeopletothinkfreshlyabouttheirdatastorageneeds,andsomedevelopmentteamsseethatusingaNoSQLdatabasecanhelptheirproductivitybysimplifyingtheirdatabaseaccesseveniftheyhavenoneedtoscalebeyondasinglemachine.So,asyoureadtherestofthisbook,remembertherearetwoprimaryreasonsforconsidering

NoSQL.Oneistohandledataaccesswithsizesandperformancethatdemandacluster;theotheristoimprovetheproductivityofapplicationdevelopmentbyusingamoreconvenientdatainteractionstyle.

1.6.KeyPoints•Relationaldatabaseshavebeenasuccessfultechnologyfortwentyyears,providingpersistence,concurrencycontrol,andanintegrationmechanism.

•Applicationdevelopershavebeenfrustratedwiththeimpedancemismatchbetweentherelationalmodelandthein-memorydatastructures.

•Thereisamovementawayfromusingdatabasesasintegrationpointstowardsencapsulatingdatabaseswithinapplicationsandintegratingthroughservices.

•Thevitalfactorforachangeindatastoragewastheneedtosupportlargevolumesofdatabyrunningonclusters.Relationaldatabasesarenotdesignedtorunefficientlyonclusters.

•NoSQLisanaccidentalneologism.Thereisnoprescriptivedefinition—allyoucanmakeisanobservationofcommoncharacteristics.

•ThecommoncharacteristicsofNoSQLdatabasesare•Notusingtherelationalmodel•Runningwellonclusters•Open-source•Builtforthe21stcenturywebestates•Schemaless

•ThemostimportantresultoftheriseofNoSQLisPolyglotPersistence.

Chapter2.AggregateDataModels

Adatamodelisthemodelthroughwhichweperceiveandmanipulateourdata.Forpeopleusingadatabase,thedatamodeldescribeshowweinteractwiththedatainthedatabase.Thisisdistinctfromastoragemodel,whichdescribeshowthedatabasestoresandmanipulatesthedatainternally.Inanidealworld,weshouldbeignorantofthestoragemodel,butinpracticeweneedatleastsomeinklingofit—primarilytoachievedecentperformance.Inconversation,theterm“datamodel”oftenmeansthemodelofthespecificdatainanapplication.

Adevelopermightpointtoanentity-relationshipdiagramoftheirdatabaseandrefertothatastheirdatamodelcontainingcustomers,orders,products,andthelike.However,inthisbookwe’llmostlybeusing“datamodel”torefertothemodelbywhichthedatabaseorganizesdata—whatmightbemoreformallycalledametamodel.Thedominantdatamodelofthelastcoupleofdecadesistherelationaldatamodel,whichisbest

visualizedasasetoftables,ratherlikeapageofaspreadsheet.Eachtablehasrows,witheachrowrepresentingsomeentityofinterest.Wedescribethisentitythroughcolumns,eachhavingasinglevalue.Acolumnmayrefertoanotherrowinthesameordifferenttable,whichconstitutesarelationshipbetweenthoseentities.(We’reusinginformalbutcommonterminologywhenwespeakoftablesandrows;themoreformaltermswouldberelationsandtuples.)OneofthemostobviousshiftswithNoSQLisamoveawayfromtherelationalmodel.Each

NoSQLsolutionhasadifferentmodelthatituses,whichweputintofourcategorieswidelyusedintheNoSQLecosystem:key-value,document,column-family,andgraph.Ofthese,thefirstthreeshareacommoncharacteristicoftheirdatamodelswhichwewillcallaggregateorientation.Inthischapterwe’llexplainwhatwemeanbyaggregateorientationandwhatitmeansfordatamodels.

2.1.AggregatesTherelationalmodeltakestheinformationthatwewanttostoreanddividesitintotuples(rows).Atupleisalimiteddatastructure:Itcapturesasetofvalues,soyoucannotnestonetuplewithinanothertogetnestedrecords,norcanyouputalistofvaluesortupleswithinanother.Thissimplicityunderpinstherelationalmodel—itallowsustothinkofalloperationsasoperatingonandreturningtuples.Aggregateorientationtakesadifferentapproach.Itrecognizesthatoften,youwanttooperateon

datainunitsthathaveamorecomplexstructurethanasetoftuples.Itcanbehandytothinkintermsofacomplexrecordthatallowslistsandotherrecordstructurestobenestedinsideit.Aswe’llsee,key-value,document,andcolumn-familydatabasesallmakeuseofthismorecomplexrecord.However,thereisnocommontermforthiscomplexrecord;inthisbookweusetheterm“aggregate.”AggregateisatermthatcomesfromDomain-DrivenDesign[Evans].InDomain-DrivenDesign,

anaggregateisacollectionofrelatedobjectsthatwewishtotreatasaunit.Inparticular,itisaunitfordatamanipulationandmanagementofconsistency.Typically,weliketoupdateaggregateswithatomicoperationsandcommunicatewithourdatastorageintermsofaggregates.Thisdefinitionmatchesreallywellwithhowkey-value,document,andcolumn-familydatabaseswork.Dealinginaggregatesmakesitmucheasierforthesedatabasestohandleoperatingonacluster,sincetheaggregatemakesanaturalunitforreplicationandsharding.Aggregatesarealsoofteneasierforapplicationprogrammerstoworkwith,sincetheyoftenmanipulatedatathroughaggregatestructures.

2.1.1.ExampleofRelationsandAggregatesAtthispoint,anexamplemayhelpexplainwhatwe’retalkingabout.Let’sassumewehavetobuildane-commercewebsite;wearegoingtobesellingitemsdirectlytocustomersovertheweb,andwewillhavetostoreinformationaboutusers,ourproductcatalog,orders,shippingaddresses,billingaddresses,andpaymentdata.WecanusethisscenariotomodelthedatausingarelationdatastoreaswellasNoSQLdatastoresandtalkabouttheirprosandcons.Forarelationaldatabase,wemightstartwithadatamodelshowninFigure2.1.

Figure2.1.Datamodelorientedaroundarelationaldatabase(usingUMLnotation[FowlerUML])

Figure2.2presentssomesampledataforthismodel.

Figure2.2.TypicaldatausingRDBMSdatamodelAswe’regoodrelationalsoldiers,everythingisproperlynormalized,sothatnodataisrepeatedin

multipletables.Wealsohavereferentialintegrity.Arealisticordersystemwouldnaturallybemoreinvolvedthanthis,butthisisthebenefitoftherarefiedairofabook.Nowlet’sseehowthismodelmightlookwhenwethinkinmoreaggregate-orientedterms(Figure

2.3).

Figure2.3.AnaggregatedatamodelAgain,wehavesomesampledata,whichwe’llshowinJSONformatasthat’sacommon

representationfordatainNoSQLland.Clickheretoviewcodeimage

//incustomers{"id":1,"name":"Martin","billingAddress":[{"city":"Chicago"}]}

//inorders{"id":99,"customerId":1,"orderItems":[{"productId":27,"price":32.45,"productName":"NoSQLDistilled"}],"shippingAddress":[{"city":"Chicago"}]"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress":{"city":"Chicago"}}],}

Inthismodel,wehavetwomainaggregates:customerandorder.We’veusedtheblack-diamondcompositionmarkerinUMLtoshowhowdatafitsintotheaggregationstructure.Thecustomercontainsalistofbillingaddresses;theordercontainsalistoforderitems,ashippingaddress,andpayments.Thepaymentitselfcontainsabillingaddressforthatpayment.Asinglelogicaladdressrecordappearsthreetimesintheexampledata,butinsteadofusingIDsit’s

treatedasavalueandcopiedeachtime.Thisfitsthedomainwherewewouldnotwanttheshippingaddress,northepayment’sbillingaddress,tochange.Inarelationaldatabase,wewouldensurethattheaddressrowsaren’tupdatedforthiscase,makinganewrowinstead.Withaggregates,wecancopythewholeaddressstructureintotheaggregateasweneedto.Thelinkbetweenthecustomerandtheorderisn’twithineitheraggregate—it’sarelationship

betweenaggregates.Similarly,thelinkfromanorderitemwouldcrossintoaseparateaggregatestructureforproducts,whichwehaven’tgoneinto.We’veshowntheproductnameaspartoftheorderitemhere—thiskindofdenormalizationissimilartothetradeoffswithrelationaldatabases,butismorecommonwithaggregatesbecausewewanttominimizethenumberofaggregatesweaccessduringadatainteraction.Theimportantthingtonoticehereisn’ttheparticularwaywe’vedrawntheaggregateboundaryso

muchasthefactthatyouhavetothinkaboutaccessingthatdata—andmakethatpartofyourthinkingwhendevelopingtheapplicationdatamodel.Indeedwecoulddrawouraggregateboundariesdifferently,puttingalltheordersforacustomerintothecustomeraggregate(Figure2.4).

Figure2.4.Embedalltheobjectsforcustomerandthecustomer’sordersUsingtheabovedatamodel,anexampleCustomerandOrderwouldlooklikethis:

Clickheretoviewcodeimage

//incustomers{"customer":{"id":1,"name":"Martin","billingAddress":[{"city":"Chicago"}],"orders":[{"id":99,"customerId":1,"orderItems":[{"productId":27,"price":32.45,"productName":"NoSQLDistilled"}],"shippingAddress":[{"city":"Chicago"}]"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress":{"city":"Chicago"}}],}]}}

Likemostthingsinmodeling,there’snouniversalanswerforhowtodrawyouraggregateboundaries.Itdependsentirelyonhowyoutendtomanipulateyourdata.Ifyoutendtoaccessacustomertogetherwithallofthatcustomer ’sordersatonce,thenyouwouldpreferasingleaggregate.However,ifyoutendtofocusonaccessingasingleorderatatime,thenyoushouldpreferhavingseparateaggregatesforeachorder.Naturally,thisisverycontext-specific;someapplicationswillpreferoneortheother,evenwithinasinglesystem,whichisexactlywhymanypeoplepreferaggregateignorance.

2.1.2.ConsequencesofAggregateOrientationWhiletherelationalmappingcapturesthevariousdataelementsandtheirrelationshipsreasonablywell,itdoessowithoutanynotionofanaggregateentity.Inourdomainlanguage,wemightsaythatanorderconsistsoforderitems,ashippingaddress,andapayment.Thiscanbeexpressedintherelationalmodelintermsofforeignkeyrelationships—butthereisnothingtodistinguishrelationshipsthatrepresentaggregationsfromthosethatdon’t.Asaresult,thedatabasecan’tuseaknowledgeofaggregatestructuretohelpitstoreanddistributethedata.Variousdatamodelingtechniqueshaveprovidedwaysofmarkingaggregateorcomposite

structures.Theproblem,however,isthatmodelersrarelyprovideanysemanticsforwhatmakesanaggregaterelationshipdifferentfromanyother;wheretherearesemantics,theyvary.Whenworkingwithaggregate-orienteddatabases,wehaveaclearersemanticstoconsiderbyfocusingontheunitofinteractionwiththedatastorage.Itis,however,notalogicaldataproperty:It’sallabouthowthedataisbeingusedbyapplications—aconcernthatisoftenoutsidetheboundsofdatamodeling.Relationaldatabaseshavenoconceptofaggregatewithintheirdatamodel,sowecallthem

aggregate-ignorant.IntheNoSQLworld,graphdatabasesarealsoaggregate-ignorant.Beingaggregate-ignorantisnotabadthing.It’softendifficulttodrawaggregateboundarieswell,particularlyifthesamedataisusedinmanydifferentcontexts.Anordermakesagoodaggregatewhenacustomerismakingandreviewingorders,andwhentheretailerisprocessingorders.However,ifaretailerwantstoanalyzeitsproductsalesoverthelastfewmonths,thenanorderaggregatebecomesatrouble.Togettoproductsaleshistory,you’llhavetodigintoeveryaggregateinthedatabase.Soanaggregatestructuremayhelpwithsomedatainteractionsbutbeanobstacleforothers.Anaggregate-ignorantmodelallowsyoutoeasilylookatthedataindifferentways,soitisabetterchoicewhenyoudon’thaveaprimarystructureformanipulatingyourdata.Theclinchingreasonforaggregateorientationisthatithelpsgreatlywithrunningonacluster,

whichasyou’llrememberisthekillerargumentfortheriseofNoSQL.Ifwe’rerunningonacluster,weneedtominimizehowmanynodesweneedtoquerywhenwearegatheringdata.Byexplicitlyincludingaggregates,wegivethedatabaseimportantinformationaboutwhichbitsofdatawillbemanipulatedtogether,andthusshouldliveonthesamenode.Aggregateshaveanimportantconsequencefortransactions.Relationaldatabasesallowyouto

manipulateanycombinationofrowsfromanytablesinasingletransaction.SuchtransactionsarecalledACIDtransactions:Atomic,Consistent,Isolated,andDurable.ACIDisarathercontrivedacronym;therealpointistheatomicity:Manyrowsspanningmanytablesareupdatedasasingleoperation.Thisoperationeithersucceedsorfailsinitsentirety,andconcurrentoperationsareisolatedfromeachothersotheycannotseeapartialupdate.It’softensaidthatNoSQLdatabasesdon’tsupportACIDtransactionsandthussacrificeconsistency.

Thisisarathersweepingsimplification.Ingeneral,it’struethataggregate-orienteddatabasesdon’thaveACIDtransactionsthatspanmultipleaggregates.Instead,theysupportatomicmanipulationofasingleaggregateatatime.Thismeansthatifweneedtomanipulatemultipleaggregatesinanatomic

way,wehavetomanagethatourselvesintheapplicationcode.Inpractice,wefindthatmostofthetimeweareabletokeepouratomicityneedstowithinasingleaggregate;indeed,that’spartoftheconsiderationfordecidinghowtodivideupourdataintoaggregates.Weshouldalsorememberthatgraphandotheraggregate-ignorantdatabasesusuallydosupportACIDtransactionssimilartorelationaldatabases.Aboveall,thetopicofconsistencyismuchmoreinvolvedthanwhetheradatabaseisACIDornot,aswe’llexploreinChapter5.

2.2.Key-ValueandDocumentDataModelsWesaidearlieronthatkey-valueanddocumentdatabaseswerestronglyaggregate-oriented.Whatwemeantbythiswasthatwethinkofthesedatabasesasprimarilyconstructedthroughaggregates.BothofthesetypesofdatabasesconsistoflotsofaggregateswitheachaggregatehavingakeyorIDthat’susedtogetatthedata.Thetwomodelsdifferinthatinakey-valuedatabase,theaggregateisopaquetothedatabase—just

somebigblobofmostlymeaninglessbits.Incontrast,adocumentdatabaseisabletoseeastructureintheaggregate.Theadvantageofopacityisthatwecanstorewhateverwelikeintheaggregate.Thedatabasemayimposesomegeneralsizelimit,butotherthanthatwehavecompletefreedom.Adocumentdatabaseimposeslimitsonwhatwecanplaceinit,definingallowablestructuresandtypes.Inreturn,however,wegetmoreflexibilityinaccess.Withakey-valuestore,wecanonlyaccessanaggregatebylookupbasedonitskey.Witha

documentdatabase,wecansubmitqueriestothedatabasebasedonthefieldsintheaggregate,wecanretrievepartoftheaggregateratherthanthewholething,anddatabasecancreateindexesbasedonthecontentsoftheaggregate.Inpractice,thelinebetweenkey-valueanddocumentgetsabitblurry.PeopleoftenputanIDfield

inadocumentdatabasetodoakey-valuestylelookup.Databasesclassifiedaskey-valuedatabasesmayallowyoustructuresfordatabeyondjustanopaqueaggregate.Forexample,Riakallowsyoutoaddmetadatatoaggregatesforindexingandinteraggregatelinks,Redisallowsyoutobreakdowntheaggregateintolistsorsets.YoucansupportqueryingbyintegratingsearchtoolssuchasSolr.Asanexample,RiakincludesasearchfacilitythatusesSolr-likesearchingonanyaggregatesthatarestoredasJSONorXMLstructures.Despitethisblurriness,thegeneraldistinctionstillholds.Withkey-valuedatabases,weexpectto

mostlylookupaggregatesusingakey.Withdocumentdatabases,wemostlyexpecttosubmitsomeformofquerybasedontheinternalstructureofthedocument;thismightbeakey,butit’smorelikelytobesomethingelse.

2.3.Column-FamilyStoresOneoftheearlyandinfluentialNoSQLdatabaseswasGoogle’sBigTable[Changetc.].Itsnameconjuredupatabularstructurewhichitrealizedwithsparsecolumnsandnoschema.Asyou’llsoonsee,itdoesn’thelptothinkofthisstructureasatable;rather,itisatwo-levelmap.But,howeveryouthinkaboutthestructure,ithasbeenamodelthatinfluencedlaterdatabasessuchasHBaseandCassandra.Thesedatabaseswithabigtable-styledatamodelareoftenreferredtoascolumnstores,butthat

namehasbeenaroundforawhiletodescribeadifferentanimal.Pre-NoSQLcolumnstores,suchasC-Store[C-Store],werehappywithSQLandtherelationalmodel.Thethingthatmadethemdifferentwasthewayinwhichtheyphysicallystoreddata.Mostdatabaseshavearowasaunitofstoragewhich,inparticular,helpswriteperformance.However,therearemanyscenarioswherewritesare

rare,butyouoftenneedtoreadafewcolumnsofmanyrowsatonce.Inthissituation,it’sbettertostoregroupsofcolumnsforallrowsasthebasicstorageunit—whichiswhythesedatabasesarecalledcolumnstores.Bigtableanditsoffspringfollowthisnotionofstoringgroupsofcolumns(columnfamilies)

together,butpartcompanywithC-StoreandfriendsbyabandoningtherelationalmodelandSQL.Inthisbook,werefertothisclassofdatabasesascolumn-familydatabases.Perhapsthebestwaytothinkofthecolumn-familymodelisasatwo-levelaggregatestructure.As

withkey-valuestores,thefirstkeyisoftendescribedasarowidentifier,pickinguptheaggregateofinterest.Thedifferencewithcolumn-familystructuresisthatthisrowaggregateisitselfformedofamapofmoredetailedvalues.Thesesecond-levelvaluesarereferredtoascolumns.Aswellasaccessingtherowasawhole,operationsalsoallowpickingoutaparticularcolumn,sotogetaparticularcustomer ’snamefromFigure2.5youcoulddosomethinglikeget('1234','name').

Figure2.5.Representingcustomerinformationinacolumn-familystructureColumn-familydatabasesorganizetheircolumnsintocolumnfamilies.Eachcolumnhastobepart

ofasinglecolumnfamily,andthecolumnactsasunitforaccess,withtheassumptionthatdataforaparticularcolumnfamilywillbeusuallyaccessedtogether.Thisalsogivesyouacoupleofwaystothinkabouthowthedataisstructured.•Row-oriented:Eachrowisanaggregate(forexample,customerwiththeIDof1234)withcolumnfamiliesrepresentingusefulchunksofdata(profile,orderhistory)withinthataggregate.

•Column-oriented:Eachcolumnfamilydefinesarecordtype(e.g.,customerprofiles)withrowsforeachoftherecords.Youthenthinkofarowasthejoinofrecordsinallcolumnfamilies.

Thislatteraspectreflectsthecolumnarnatureofcolumn-familydatabases.Sincethedatabaseknowsaboutthesecommongroupingsofdata,itcanusethisinformationforitsstorageandaccessbehavior.Eventhoughadocumentdatabasedeclaressomestructuretothedatabase,eachdocumentis

stillseenasasingleunit.Columnfamiliesgiveatwo-dimensionalqualitytocolumn-familydatabases.ThisterminologyisasestablishedbyGoogleBigtableandHBase,butCassandralooksatthings

slightlydifferently.ArowinCassandraonlyoccursinonecolumnfamily,butthatcolumnfamilymaycontainsupercolumns—columnsthatcontainnestedcolumns.ThesupercolumnsinCassandraarethebestequivalenttotheclassicBigtablecolumnfamilies.Itcanstillbeconfusingtothinkofcolumn-familiesastables.Youcanaddanycolumntoanyrow,

androwscanhaveverydifferentcolumnkeys.Whilenewcolumnsareaddedtorowsduringregulardatabaseaccess,definingnewcolumnfamiliesismuchrarerandmayinvolvestoppingthedatabaseforittohappen.TheexampleofFigure2.5illustratesanotheraspectofcolumn-familydatabasesthatmaybe

unfamiliarforpeopleusedtorelationaltables:theorderscolumnfamily.Sincecolumnscanbeaddedfreely,youcanmodelalistofitemsbymakingeachitemaseparatecolumn.Thisisveryoddifyouthinkofacolumnfamilyasatable,butquitenaturalifyouthinkofacolumn-familyrowasanaggregate.Cassandrausestheterms“wide”and“skinny.”Skinnyrowshavefewcolumnswiththesamecolumnsusedacrossthemanydifferentrows.Inthiscase,thecolumnfamilydefinesarecordtype,eachrowisarecord,andeachcolumnisafield.Awiderowhasmanycolumns(perhapsthousands),withrowshavingverydifferentcolumns.Awidecolumnfamilymodelsalist,witheachcolumnbeingoneelementinthatlist.Aconsequenceofwidecolumnfamiliesisthatacolumnfamilymaydefineasortorderforits

columns.Thiswaywecanaccessordersbytheirorderkeyandaccessrangesofordersbytheirkeys.WhilethismightnotbeusefulifwekeyedordersbytheirIDs,itwouldbeifwemadethekeyoutofaconcatenationofdateandID(e.g.,20111027-1001).Althoughit’susefultodistinguishcolumnfamiliesbytheirwideorskinnynature,there’sno

technicalreasonwhyacolumnfamilycannotcontainbothfield-likecolumnsandlist-likecolumns—althoughdoingthiswouldconfusethesortordering.

2.4.SummarizingAggregate-OrientedDatabasesAtthispoint,we’vecoveredenoughmaterialtogiveyouareasonableoverviewofthethreedifferentstylesofaggregate-orienteddatamodelsandhowtheydiffer.Whattheyallshareisthenotionofanaggregateindexedbyakeythatyoucanuseforlookup.This

aggregateiscentraltorunningonacluster,asthedatabasewillensurethatallthedataforanaggregateisstoredtogetherononenode.Theaggregatealsoactsastheatomicunitforupdates,providingauseful,iflimited,amountoftransactionalcontrol.Withinthatnotionofaggregate,wehavesomedifferences.Thekey-valuedatamodeltreatsthe

aggregateasanopaquewhole,whichmeansyoucanonlydokeylookupforthewholeaggregate—youcannotrunaquerynorretrieveapartoftheaggregate.Thedocumentmodelmakestheaggregatetransparenttothedatabaseallowingyoutodoqueries

andpartialretrievals.However,sincethedocumenthasnoschema,thedatabasecannotactmuchonthestructureofthedocumenttooptimizethestorageandretrievalofpartsoftheaggregate.Column-familymodelsdividetheaggregateintocolumnfamilies,allowingthedatabasetotreat

themasunitsofdatawithintherowaggregate.Thisimposessomestructureontheaggregatebutallowsthedatabasetotakeadvantageofthatstructuretoimproveitsaccessibility.

2.5.FurtherReading

Formoreonthegeneralconceptofaggregates,whichareoftenusedwithrelationaldatabasestoo,see[Evans].TheDomain-DrivenDesigncommunityisthebestsourceforfurtherinformationaboutaggregates—recentinformationusuallyappearsathttp://domaindrivendesign.org.

2.6.KeyPoints•Anaggregateisacollectionofdatathatweinteractwithasaunit.AggregatesformtheboundariesforACIDoperationswiththedatabase.

•Key-value,document,andcolumn-familydatabasescanallbeseenasformsofaggregate-orienteddatabase.

•Aggregatesmakeiteasierforthedatabasetomanagedatastorageoverclusters.•Aggregate-orienteddatabasesworkbestwhenmostdatainteractionisdonewiththesameaggregate;aggregate-ignorantdatabasesarebetterwheninteractionsusedataorganizedinmanydifferentformations.

Chapter3.MoreDetailsonDataModels

Sofarwe’vecoveredthekeyfeatureinmostNoSQLdatabases:theiruseofaggregatesandhowaggregate-orienteddatabasesmodelaggregatesindifferentways.WhileaggregatesareacentralpartoftheNoSQLstory,thereismoretothedatamodelingsidethanthat,andwe’llexplorethesefurtherconceptsinthischapter.

3.1.RelationshipsAggregatesareusefulinthattheyputtogetherdatathatiscommonlyaccessedtogether.Buttherearestilllotsofcaseswheredatathat’srelatedisaccesseddifferently.Considertherelationshipbetweenacustomerandallofhisorders.Someapplicationswillwanttoaccesstheorderhistorywhenevertheyaccessthecustomer;thisfitsinwellwithcombiningthecustomerwithhisorderhistoryintoasingleaggregate.Otherapplications,however,wanttoprocessordersindividuallyandthusmodelordersasindependentaggregates.Inthiscase,you’llwantseparateorderandcustomeraggregatesbutwithsomekindofrelationship

betweenthemsothatanyworkonanordercanlookupcustomerdata.ThesimplestwaytoprovidesuchalinkistoembedtheIDofthecustomerwithintheorder ’saggregatedata.Thatway,ifyouneeddatafromthecustomerrecord,youreadtheorder,ferretoutthecustomerID,andmakeanothercalltothedatabasetoreadthecustomerdata.Thiswillwork,andwillbejustfineinmanyscenarios—butthedatabasewillbeignorantoftherelationshipinthedata.Thiscanbeimportantbecausetherearetimeswhenit’susefulforthedatabasetoknowabouttheselinks.Asaresult,manydatabases—evenkey-valuestores—providewaystomaketheserelationships

visibletothedatabase.Documentstoresmakethecontentoftheaggregateavailabletothedatabasetoformindexesandqueries.Riak,akey-valuestore,allowsyoutoputlinkinformationinmetadata,supportingpartialretrievalandlink-walkingcapability.Animportantaspectofrelationshipsbetweenaggregatesishowtheyhandleupdates.Aggregate-

orienteddatabasestreattheaggregateastheunitofdata-retrieval.Consequently,atomicityisonlysupportedwithinthecontentsofasingleaggregate.Ifyouupdatemultipleaggregatesatonce,youhavetodealyourselfwithafailurepartwaythrough.Relationaldatabaseshelpyouwiththisbyallowingyoutomodifymultiplerecordsinasingletransaction,providingACIDguaranteeswhilealteringmanyrows.Allofthismeansthataggregate-orienteddatabasesbecomemoreawkwardasyouneedtooperate

acrossmultipleaggregates.Therearevariouswaystodealwiththis,whichwe’llexplorelaterinthischapter,butthefundamentalawkwardnessremains.Thismayimplythatifyouhavedatabasedonlotsofrelationships,youshouldpreferarelational

databaseoveraNoSQLstore.Whilethat’strueforaggregate-orienteddatabases,it’sworthrememberingthatrelationaldatabasesaren’tallthatstellarwithcomplexrelationshipseither.WhileyoucanexpressqueriesinvolvingjoinsinSQL,thingsquicklygetveryhairy—bothwithSQLwritingandwiththeresultingperformance—asthenumberofjoinsmountsup.Thismakesitagoodmomenttointroduceanothercategoryofdatabasesthat’softenlumpedinto

theNoSQLpile.

3.2.GraphDatabasesGraphdatabasesareanoddfishintheNoSQLpond.MostNoSQLdatabaseswereinspiredbythe

needtorunonclusters,whichledtoaggregate-orienteddatamodelsoflargerecordswithsimpleconnections.Graphdatabasesaremotivatedbyadifferentfrustrationwithrelationaldatabasesandthushaveanoppositemodel—smallrecordswithcomplexinterconnections,somethinglikeFigure3.1.

Figure3.1.AnexamplegraphstructureInthiscontext,agraphisn’tabarchartorhistogram;instead,werefertoagraphdatastructureof

nodesconnectedbyedges.InFigure3.1wehaveawebofinformationwhosenodesareverysmall(nothingmorethana

name)butthereisarichstructureofinterconnectionsbetweenthem.Withthisstructure,wecanaskquestionssuchas“findthebooksintheDatabasescategorythatarewrittenbysomeonewhomafriendofminelikes.”Graphdatabasesspecializeincapturingthissortofinformation—butonamuchlargerscalethana

readablediagramcouldcapture.Thisisidealforcapturinganydataconsistingofcomplexrelationshipssuchassocialnetworks,productpreferences,oreligibilityrules.Thefundamentaldatamodelofagraphdatabaseisverysimple:nodesconnectedbyedges(also

calledarcs).Beyondthisessentialcharacteristicthereisalotofvariationindatamodels—inparticular,whatmechanismsyouhavetostoredatainyournodesandedges.Aquicksampleofsomecurrentcapabilitiesillustratesthisvarietyofpossibilities:FlockDBissimplynodesandedgeswithnomechanismforadditionalattributes;Neo4JallowsyoutoattachJavaobjectsaspropertiestonodesandedgesinaschemalessfashion(“Features,”p.113);InfiniteGraphstoresyourJavaobjects,whicharesubclassesofitsbuilt-intypes,asnodesandedges.

Onceyouhavebuiltupagraphofnodesandedges,agraphdatabaseallowsyoutoquerythatnetworkwithqueryoperationsdesignedwiththiskindofgraphinmind.Thisiswheretheimportantdifferencesbetweengraphandrelationaldatabasescomein.Althoughrelationaldatabasescanimplementrelationshipsusingforeignkeys,thejoinsrequiredtonavigatearoundcangetquiteexpensive—whichmeansperformanceisoftenpoorforhighlyconnecteddatamodels.Graphdatabasesmaketraversalalongtherelationshipsverycheap.Alargepartofthisisbecausegraphdatabasesshiftmostoftheworkofnavigatingrelationshipsfromquerytimetoinserttime.Thisnaturallypaysoffforsituationswherequeryingperformanceismoreimportantthaninsertspeed.Mostofthetimeyoufinddatabynavigatingthroughthenetworkofedges,withqueriessuchas

“tellmeallthethingsthatbothAnnaandBarbaralike.”Youdoneedastartingplace,however,sousuallysomenodescanbeindexedbyanattributesuchasID.SoyoumightstartwithanIDlookup(i.e.,lookupthepeoplenamed“Anna”and“Barbara”)andthenstartusingtheedges.Still,graphdatabasesexpectmostofyourqueryworktobenavigatingrelationships.Theemphasisonrelationshipsmakesgraphdatabasesverydifferentfromaggregate-oriented

databases.Thisdatamodeldifferencehasconsequencesinotheraspects,too;you’llfindsuchdatabasesaremorelikelytorunonasingleserverratherthandistributedacrossclusters.ACIDtransactionsneedtocovermultiplenodesandedgestomaintainconsistency.Theonlythingtheyhaveincommonwithaggregate-orienteddatabasesistheirrejectionoftherelationalmodelandanupsurgeinattentiontheyreceivedaroundthesametimeastherestoftheNoSQLfield.

3.3.SchemalessDatabasesAcommonthemeacrossalltheformsofNoSQLdatabasesisthattheyareschemaless.Whenyouwanttostoredatainarelationaldatabase,youfirsthavetodefineaschema—adefinedstructureforthedatabasewhichsayswhattablesexist,whichcolumnsexist,andwhatdatatypeseachcolumncanhold.Beforeyoustoresomedata,youhavetohavetheschemadefinedforit.WithNoSQLdatabases,storingdataismuchmorecasual.Akey-valuestoreallowsyoutostore

anydatayoulikeunderakey.Adocumentdatabaseeffectivelydoesthesamething,sinceitmakesnorestrictionsonthestructureofthedocumentsyoustore.Column-familydatabasesallowyoutostoreanydataunderanycolumnyoulike.Graphdatabasesallowyoutofreelyaddnewedgesandfreelyaddpropertiestonodesandedgesasyouwish.Advocatesofschemalessnessrejoiceinthisfreedomandflexibility.Withaschema,youhaveto

figureoutinadvancewhatyouneedtostore,butthatcanbehardtodo.Withoutaschemabindingyou,youcaneasilystorewhateveryouneed.Thisallowsyoutoeasilychangeyourdatastorageasyoulearnmoreaboutyourproject.Youcaneasilyaddnewthingsasyoudiscoverthem.Furthermore,ifyoufindyoudon’tneedsomethingsanymore,youcanjuststopstoringthem,withoutworryingaboutlosingolddataasyouwouldifyoudeletecolumnsinarelationalschema.Aswellashandlingchanges,aschemalessstorealsomakesiteasiertodealwithnonuniformdata:

datawhereeachrecordhasadifferentsetoffields.Aschemaputsallrowsofatableintoastraightjacket,whichbecomesawkwardifyouhavedifferentkindsofdataindifferentrows.Youeitherendupwithlotsofcolumnsthatareusuallynull(asparsetable),oryouendupwithmeaninglesscolumnslikecustomcolumn4.Schemalessnessavoidsthis,allowingeachrecordtocontainjustwhatitneeds—nomore,noless.Schemalessnessisappealing,anditcertainlyavoidsmanyproblemsthatexistwithfixed-schema

databases,butitbringssomeproblemsofitsown.IfallyouaredoingisstoringsomedataanddisplayingitinareportasasimplelistoffieldName:valuelinesthenaschemaisonlygoingtoget

intheway.Butusuallywedowithourdatamorethanthis,andwedoitwithprogramsthatneedtoknowthatthebillingaddressiscalledbillingAddressandnotaddressForBillingandthatthequantifyfieldisgoingtobeaninteger5andnotfive.Thevital,ifsometimesinconvenient,factisthatwheneverwewriteaprogramthataccessesdata,

thatprogramalmostalwaysreliesonsomeformofimplicitschema.UnlessitjustsayssomethinglikeClickheretoviewcodeimage

//pseudocodeforeach(Recordrinrecords){foreach(Fieldfinr.fields){print(f.name,f.value)}}

itwillassumethatcertainfieldnamesarepresentandcarrydatawithacertainmeaning,andassumesomethingaboutthetypeofdatastoredwithinthatfield.Programsarenothumans;theycannotread“qty”andinferthatthatmustbethesameas“quantity”—atleastnotunlesswespecificallyprogramthemtodoso.So,howeverschemalessourdatabaseis,thereisusuallyanimplicitschemapresent.Thisimplicitschemaisasetofassumptionsaboutthedata’sstructureinthecodethatmanipulatesthedata.Havingtheimplicitschemaintheapplicationcoderesultsinsomeproblems.Itmeansthatinorder

tounderstandwhatdataispresentyouhavetodigintotheapplicationcode.Ifthatcodeiswellstructuredyoushouldbeabletofindaclearplacefromwhichtodeducetheschema.Buttherearenoguarantees;italldependsonhowcleartheapplicationcodeis.Furthermore,thedatabaseremainsignorantoftheschema—itcan’tusetheschematohelpitdecidehowtostoreandretrievedataefficiently.Itcan’tapplyitsownvalidationsuponthatdatatoensurethatdifferentapplicationsdon’tmanipulatedatainaninconsistentway.Thesearethereasonswhyrelationaldatabaseshaveafixedschema,andindeedthereasonswhy

mostdatabaseshavehadfixedschemasinthepast.Schemashavevalue,andtherejectionofschemasbyNoSQLdatabasesisindeedquitestartling.Essentially,aschemalessdatabaseshiftstheschemaintotheapplicationcodethataccessesit.This

becomesproblematicifmultipleapplications,developedbydifferentpeople,accessthesamedatabase.Theseproblemscanbereducedwithacoupleofapproaches.Oneistoencapsulatealldatabaseinteractionwithinasingleapplicationandintegrateitwithotherapplicationsusingwebservices.Thisfitsinwellwithmanypeople’scurrentpreferenceforusingwebservicesforintegration.Anotherapproachistoclearlydelineatedifferentareasofanaggregateforaccessbydifferentapplications.Thesecouldbedifferentsectionsinadocumentdatabaseordifferentcolumnfamiliesanacolumn-familydatabase.AlthoughNoSQLfansoftencriticizerelationalschemasforhavingtobedefinedupfrontand

beinginflexible,that’snotreallytrue.RelationalschemascanbechangedatanytimewithstandardSQLcommands.Ifnecessary,youcancreatenewcolumnsinanad-hocwaytostorenonuniformdata.Wehaveonlyrarelyseenthisdone,butitworkedreasonablywellwherewehave.Mostofthetime,however,nonuniformityinyourdataisagoodreasontofavoraschemalessdatabase.Schemalessnessdoeshaveabigimpactonchangesofadatabase’sstructureovertime,particularly

formoreuniformdata.Althoughit’snotpracticedaswidelyasitoughttobe,changingarelationaldatabase’sschemacanbedoneinacontrolledway.Similarly,youhavetoexercisecontrolwhenchanginghowyoustoredatainaschemalessdatabasesothatyoucaneasilyaccessbotholdandnew

data.Furthermore,theflexibilitythatschemalessnessgivesyouonlyapplieswithinanaggregate—ifyouneedtochangeyouraggregateboundaries,themigrationiseverybitascomplexasitisintherelationalcase.We’lltalkmoreaboutdatabasemigrationlater(“SchemaMigrations,”p.123).

3.4.MaterializedViewsWhenwetalkedaboutaggregate-orienteddatamodels,westressedtheiradvantages.Ifyouwanttoaccessorders,it’susefultohaveallthedataforanordercontainedinasingleaggregatethatcanbestoredandaccessedasaunit.Butaggregate-orientationhasacorrespondingdisadvantage:Whathappensifaproductmanagerwantstoknowhowmuchaparticularitemhassoldoverthelastcoupleofweeks?Nowtheaggregate-orientationworksagainstyou,forcingyoutopotentiallyreadeveryorderinthedatabasetoanswerthequestion.Youcanreducethisburdenbybuildinganindexontheproduct,butyou’restillworkingagainsttheaggregatestructure.Relationaldatabaseshaveanadvantageherebecausetheirlackofaggregatestructureallowsthem

tosupportaccessingdataindifferentways.Furthermore,theyprovideaconvenientmechanismthatallowsyoutolookatdatadifferentlyfromthewayit’sstored—views.Aviewislikearelationaltable(itisarelation)butit’sdefinedbycomputationoverthebasetables.Whenyouaccessaview,thedatabasecomputesthedataintheview—ahandyformofencapsulation.Viewsprovideamechanismtohidefromtheclientwhetherdataisderiveddataorbasedata—but

can’tavoidthefactthatsomeviewsareexpensivetocompute.Tocopewiththis,materializedviewswereinvented,whichareviewsthatarecomputedinadvanceandcachedondisk.Materializedviewsareeffectivefordatathatisreadheavilybutcanstandbeingsomewhatstale.AlthoughNoSQLdatabasesdon’thaveviews,theymayhaveprecomputedandcachedqueries,and

theyreusetheterm“materializedview”todescribethem.It’salsomuchmoreofacentralaspectforaggregate-orienteddatabasesthanitisforrelationalsystems,sincemostapplicationswillhavetodealwithsomequeriesthatdon’tfitwellwiththeaggregatestructure.(Often,NoSQLdatabasescreatematerializedviewsusingamap-reducecomputation,whichwe’lltalkaboutinChapter7.)Therearetworoughstrategiestobuildingamaterializedview.Thefirstistheeagerapproach

whereyouupdatethematerializedviewatthesametimeyouupdatethebasedataforit.Inthiscase,addinganorderwouldalsoupdatethepurchasehistoryaggregatesforeachproduct.Thisapproachisgoodwhenyouhavemorefrequentreadsofthematerializedviewthanyouhavewritesandyouwantthematerializedviewstobeasfreshaspossible.Theapplicationdatabase(p.7)approachisvaluablehereasitmakesiteasiertoensurethatanyupdatestobasedataalsoupdatematerializedviews.Ifyoudon’twanttopaythatoverheadoneachupdate,youcanrunbatchjobstoupdatethe

materializedviewsatregularintervals.You’llneedtounderstandyourbusinessrequirementstoassesshowstaleyourmaterializedviewscanbe.Youcanbuildmaterializedviewsoutsideofthedatabasebyreadingthedata,computingtheview,

andsavingitbacktothedatabase.Moreoftendatabaseswillsupportbuildingmaterializedviewsthemselves.Inthiscase,youprovidethecomputationthatneedstobedone,andthedatabaseexecutesthecomputationwhenneededaccordingtosomeparametersthatyouconfigure.Thisisparticularlyhandyforeagerupdatesofviewswithincrementalmap-reduce(“IncrementalMap-Reduce,”p.76).Materializedviewscanbeusedwithinthesameaggregate.Anorderdocumentmightincludean

ordersummaryelementthatprovidessummaryinformationabouttheordersothataqueryforanordersummarydoesnothavetotransfertheentireorderdocument.Usingdifferentcolumnfamiliesformaterializedviewsisacommonfeatureofcolumn-familydatabases.Anadvantageofdoingthisisthatitallowsyoutoupdatethematerializedviewwithinthesameatomicoperation.

3.5.ModelingforDataAccessAsmentionedearlier,whenmodelingdataaggregatesweneedtoconsiderhowthedataisgoingtobereadaswellaswhatarethesideeffectsondatarelatedtothoseaggregates.Let’sstartwiththemodelwhereallthedataforthecustomerisembeddedusingakey-valuestore

(seeFigure3.2).

Figure3.2.Embedalltheobjectsforcustomerandtheirorders.Inthisscenario,theapplicationcanreadthecustomer ’sinformationandalltherelateddataby

usingthekey.Iftherequirementsaretoreadtheordersortheproductssoldineachorder,thewholeobjecthastobereadandthenparsedontheclientsidetobuildtheresults.Whenreferencesareneeded,wecouldswitchtodocumentstoresandthenqueryinsidethedocuments,orevenchangethedataforthekey-valuestoretosplitthevalueobjectintoCustomerandOrderobjectsandthenmaintaintheseobjects’referencestoeachother.Withthereferences(seeFigure3.3),wecannowfindtheordersindependentlyfromtheCustomer,

andwiththeorderIdreferenceintheCustomerwecanfindallOrdersfortheCustomer.Usingaggregatesthiswayallowsforreadoptimization,butwehavetopushtheorderIdreferenceintoCustomereverytimewithanewOrder.Clickheretoviewcodeimage

#Customerobject{"customerId":1,"customer":{"name":"Martin","billingAddress":[{"city":"Chicago"}],"payment":[{"type":"debit","ccinfo":"1000-1000-1000-1000"}],"orders":[{"orderId":99}]}}

#Orderobject{"customerId":1,"orderId":99,"order":{"orderDate":"Nov-20-2011","orderItems":[{"productId":27,"price":32.45}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft"}],"shippingAddress":{"city":"Chicago"}}}

Figure3.3.CustomerisstoredseparatelyfromOrder.Aggregatescanalsobeusedtoobtainanalytics;forexample,anaggregateupdatemayfillin

informationonwhichOrdershaveagivenProductinthem.ThisdenormalizationofthedataallowsforfastaccesstothedataweareinterestedinandisthebasisforRealTimeBIorRealTimeAnalyticswhereenterprisesdon’thavetorelyonend-of-the-daybatchrunstopopulatedatawarehousetablesandgenerateanalytics;nowtheycanfillinthistypeofdata,formultipletypesofrequirements,whentheorderisplacedbythecustomer.Clickheretoviewcodeimage

{"itemid":27,"orders":{99,545,897,678}}{"itemid":29,"orders":{199,545,704,819}}

Indocumentstores,sincewecanqueryinsidedocuments,removingreferencestoOrdersfromtheCustomerobjectispossible.ThischangeallowsustonotupdatetheCustomerobjectwhenneworders

areplacedbytheCustomer.Clickheretoviewcodeimage

#Customerobject{"customerId":1,"name":"Martin","billingAddress":[{"city":"Chicago"}],"payment":[{"type":"debit","ccinfo":"1000-1000-1000-1000"}]}#Orderobject{"orderId":99,"customerId":1,"orderDate":"Nov-20-2011","orderItems":[{"productId":27,"price":32.45}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft"}],"shippingAddress":{"city":"Chicago"}}

Sincedocumentdatastoresallowyoutoquerybyattributesinsidethedocument,searchessuchas“findallordersthatincludetheRefactoringDatabasesproduct”arepossible,butthedecisiontocreateanaggregateofitemsandorderstheybelongtoisnotbasedonthedatabase’squerycapabilitybutonthereadoptimizationdesiredbytheapplication.Whenmodelingforcolumn-familystores,wehavethebenefitofthecolumnsbeingordered,

allowingustonamecolumnsthatarefrequentlyusedsothattheyarefetchedfirst.Whenusingthecolumnfamiliestomodelthedata,itisimportanttoremembertodoitperyourqueryrequirementsandnotforthepurposeofwriting;thegeneralruleistomakeiteasytoqueryanddenormalizethedataduringwrite.Asyoucanimagine,therearemultiplewaystomodelthedata;onewayistostoretheCustomerand

Orderindifferentcolumn-familyfamilies(seeFigure3.4).Here,itisimportanttonotethereferencetoalltheordersplacedbythecustomerareintheCustomercolumnfamily.Similarotherdenormalizationsaregenerallydonesothatquery(read)performanceisimproved.

Figure3.4.ConceptualviewintoacolumndatastoreWhenusinggraphdatabasestomodelthesamedata,wemodelallobjectsasnodesandrelations

withinthemasrelationships;theserelationshipshavetypesanddirectionalsignificance.Eachnodehasindependentrelationshipswithothernodes.Theserelationshipshavenameslike

PURCHASED,PAID_WITH,orBELONGS_TO(seeFigure3.5);theserelationshipnamesletyoutraversethegraph.Let’ssayyouwanttofindalltheCustomerswhoPURCHASEDaproductwiththenameRefactoringDatabase.AllweneedtodoisqueryfortheproductnodeRefactoringDatabasesandlookforalltheCustomerswiththeincomingPURCHASEDrelationship.

Figure3.5.Graphmodelofe-commercedataThistypeofrelationshiptraversalisveryeasywithgraphdatabases.Itisespeciallyconvenient

whenyouneedtousethedatatorecommendproductstousersortofindpatternsinactionstakenbyusers.

3.6.KeyPoints•Aggregate-orienteddatabasesmakeinter-aggregaterelationshipsmoredifficulttohandlethanintra-aggregaterelationships.

•Graphdatabasesorganizedataintonodeandedgegraphs;theyworkbestfordatathathascomplexrelationshipstructures.

•Schemalessdatabasesallowyoutofreelyaddfieldstorecords,butthereisusuallyanimplicitschemaexpectedbyusersofthedata.

•Aggregate-orienteddatabasesoftencomputematerializedviewstoprovidedataorganizeddifferentlyfromtheirprimaryaggregates.Thisisoftendonewithmap-reducecomputations.

Chapter4.DistributionModels

TheprimarydriverofinterestinNoSQLhasbeenitsabilitytorundatabasesonalargecluster.Asdatavolumesincrease,itbecomesmoredifficultandexpensivetoscaleup—buyabiggerservertorunthedatabaseon.Amoreappealingoptionistoscaleout—runthedatabaseonaclusterofservers.Aggregateorientationfitswellwithscalingoutbecausetheaggregateisanaturalunittousefordistribution.Dependingonyourdistributionmodel,youcangetadatastorethatwillgiveyoutheabilityto

handlelargerquantitiesofdata,theabilitytoprocessagreaterreadorwritetraffic,ormoreavailabilityinthefaceofnetworkslowdownsorbreakages.Theseareoftenimportantbenefits,buttheycomeatacost.Runningoveraclusterintroducescomplexity—soit’snotsomethingtodounlessthebenefitsarecompelling.Broadly,therearetwopathstodatadistribution:replicationandsharding.Replicationtakesthe

samedataandcopiesitovermultiplenodes.Shardingputsdifferentdataondifferentnodes.Replicationandshardingareorthogonaltechniques:Youcanuseeitherorbothofthem.Replicationcomesintotwoforms:master-slaveandpeer-to-peer.Wewillnowdiscussthesetechniquesstartingatthesimplestandworkinguptothemorecomplex:firstsingle-server,thenmaster-slavereplication,thensharding,andfinallypeer-to-peerreplication.

4.1.SingleServerThefirstandthesimplestdistributionoptionistheonewewouldmostoftenrecommend—nodistributionatall.Runthedatabaseonasinglemachinethathandlesallthereadsandwritestothedatastore.Wepreferthisoptionbecauseiteliminatesallthecomplexitiesthattheotheroptionsintroduce;it’seasyforoperationspeopletomanageandeasyforapplicationdeveloperstoreasonabout.AlthoughalotofNoSQLdatabasesaredesignedaroundtheideaofrunningonacluster,itcan

makesensetouseNoSQLwithasingle-serverdistributionmodelifthedatamodeloftheNoSQLstoreismoresuitedtotheapplication.Graphdatabasesaretheobviouscategoryhere—theseworkbestinasingle-serverconfiguration.Ifyourdatausageismostlyaboutprocessingaggregates,thenasingle-serverdocumentorkey-valuestoremaywellbeworthwhilebecauseit’seasieronapplicationdevelopers.Fortherestofthischapterwe’llbewadingthroughtheadvantagesandcomplicationsofmore

sophisticateddistributionschemes.Don’tletthevolumeofwordsfoolyouintothinkingthatwewouldprefertheseoptions.Ifwecangetawaywithoutdistributingourdata,wewillalwayschooseasingle-serverapproach.

4.2.ShardingOften,abusydatastoreisbusybecausedifferentpeopleareaccessingdifferentpartsofthedataset.Inthesecircumstanceswecansupporthorizontalscalabilitybyputtingdifferentpartsofthedataontodifferentservers—atechniquethat’scalledsharding(seeFigure4.1).

Figure4.1.Shardingputsdifferentdataonseparatenodes,eachofwhichdoesitsownreadsandwrites.

Intheidealcase,wehavedifferentusersalltalkingtodifferentservernodes.Eachuseronlyhastotalktooneserver,sogetsrapidresponsesfromthatserver.Theloadisbalancedoutnicelybetweenservers—forexample,ifwehavetenservers,eachoneonlyhastohandle10%oftheload.Ofcoursetheidealcaseisaprettyrarebeast.Inordertogetclosetoitwehavetoensurethatdata

that’saccessedtogetherisclumpedtogetheronthesamenodeandthattheseclumpsarearrangedonthenodestoprovidethebestdataaccess.Thefirstpartofthisquestionishowtoclumpthedataupsothatoneusermostlygetsherdatafrom

asingleserver.Thisiswhereaggregateorientationcomesinreallyhandy.Thewholepointofaggregatesisthatwedesignthemtocombinedatathat’scommonlyaccessedtogether—soaggregatesleapoutasanobviousunitofdistribution.Whenitcomestoarrangingthedataonthenodes,thereareseveralfactorsthatcanhelpimprove

performance.Ifyouknowthatmostaccessesofcertainaggregatesarebasedonaphysicallocation,youcanplacethedataclosetowhereit’sbeingaccessed.IfyouhaveordersforsomeonewholivesinBoston,youcanplacethatdatainyoureasternUSdatacenter.Anotherfactoristryingtokeeptheloadeven.Thismeansthatyoushouldtrytoarrange

aggregatessotheyareevenlydistributedacrossthenodeswhichallgetequalamountsoftheload.Thismayvaryovertime,forexampleifsomedatatendstobeaccessedoncertaindaysoftheweek—sotheremaybedomain-specificrulesyou’dliketouse.Insomecases,it’susefultoputaggregatestogetherifyouthinktheymaybereadinsequence.The

Bigtablepaper[Changetc.]describedkeepingitsrowsinlexicographicorderandsortingwebaddressesbasedonreverseddomainnames(e.g.,com.martinfowler).Thiswaydataformultiplepagescouldbeaccessedtogethertoimproveprocessingefficiency.Historicallymostpeoplehavedoneshardingaspartofapplicationlogic.Youmightputall

customerswithsurnamesstartingfromAtoDononeshardandEtoGonanother.Thiscomplicatestheprogrammingmodel,asapplicationcodeneedstoensurethatqueriesaredistributedacrossthe

variousshards.Furthermore,rebalancingtheshardingmeanschangingtheapplicationcodeandmigratingthedata.ManyNoSQLdatabasesofferauto-sharding,wherethedatabasetakesontheresponsibilityofallocatingdatatoshardsandensuringthatdataaccessgoestotherightshard.Thiscanmakeitmucheasiertouseshardinginanapplication.Shardingisparticularlyvaluableforperformancebecauseitcanimprovebothreadandwrite

performance.Usingreplication,particularlywithcaching,cangreatlyimprovereadperformancebutdoeslittleforapplicationsthathavealotofwrites.Shardingprovidesawaytohorizontallyscalewrites.Shardingdoeslittletoimproveresiliencewhenusedalone.Althoughthedataisondifferentnodes,

anodefailuremakesthatshard’sdataunavailablejustassurelyasitdoesforasingle-serversolution.Theresiliencebenefititdoesprovideisthatonlytheusersofthedataonthatshardwillsuffer;however,it’snotgoodtohaveadatabasewithpartofitsdatamissing.Withasingleserverit’seasiertopaytheeffortandcosttokeepthatserverupandrunning;clustersusuallytrytouselessreliablemachines,andyou’remorelikelytogetanodefailure.Soinpractice,shardingaloneislikelytodecreaseresilience.Despitethefactthatshardingismademucheasierwithaggregates,it’sstillnotasteptobetaken

lightly.Somedatabasesareintendedfromthebeginningtousesharding,inwhichcaseit’swisetorunthemonaclusterfromtheverybeginningofdevelopment,andcertainlyinproduction.Otherdatabasesuseshardingasadeliberatestepupfromasingle-serverconfiguration,inwhichcaseit’sbesttostartsingle-serverandonlyuseshardingonceyourloadprojectionsclearlyindicatethatyouarerunningoutofheadroom.Inanycasethestepfromasinglenodetoshardingisgoingtobetricky.Wehaveheardtalesof

teamsgettingintotroublebecausetheyleftshardingtoverylate,sowhentheyturneditoninproductiontheirdatabasebecameessentiallyunavailablebecausetheshardingsupportconsumedallthedatabaseresourcesformovingthedataontonewshards.Thelessonhereistouseshardingwellbeforeyouneedto—whenyouhaveenoughheadroomtocarryoutthesharding.

4.3.Master-SlaveReplicationWithmaster-slavedistribution,youreplicatedataacrossmultiplenodes.Onenodeisdesignatedasthemaster,orprimary.Thismasteristheauthoritativesourceforthedataandisusuallyresponsibleforprocessinganyupdatestothatdata.Theothernodesareslaves,orsecondaries.Areplicationprocesssynchronizestheslaveswiththemaster(seeFigure4.2).

Figure4.2.Dataisreplicatedfrommastertoslaves.Themasterservicesallwrites;readsmaycomefromeithermasterorslaves.

Master-slavereplicationismosthelpfulforscalingwhenyouhavearead-intensivedataset.Youcanscalehorizontallytohandlemorereadrequestsbyaddingmoreslavenodesandensuringthatallreadrequestsareroutedtotheslaves.Youarestill,however,limitedbytheabilityofthemastertoprocessupdatesanditsabilitytopassthoseupdateson.Consequentlyitisn’tsuchagoodschemefordatasetswithheavywritetraffic,althoughoffloadingthereadtrafficwillhelpabitwithhandlingthewriteload.Asecondadvantageofmaster-slavereplicationisreadresilience:Shouldthemasterfail,theslaves

canstillhandlereadrequests.Again,thisisusefulifmostofyourdataaccessisreads.Thefailureofthemasterdoeseliminatetheabilitytohandlewritesuntileitherthemasterisrestoredoranewmasterisappointed.However,havingslavesasreplicatesofthemasterdoesspeeduprecoveryafterafailureofthemastersinceaslavecanbeappointedanewmasterveryquickly.Theabilitytoappointaslavetoreplaceafailedmastermeansthatmaster-slavereplicationis

usefulevenifyoudon’tneedtoscaleout.Allreadandwritetrafficcangotothemasterwhiletheslaveactsasahotbackup.Inthiscaseit’seasiesttothinkofthesystemasasingle-serverstorewithahotbackup.Yougettheconvenienceofthesingle-serverconfigurationbutwithgreaterresilience—whichisparticularlyhandyifyouwanttobeabletohandleserverfailuresgracefully.Masterscanbeappointedmanuallyorautomatically.Manualappointingtypicallymeansthatwhen

youconfigureyourcluster,youconfigureonenodeasthemaster.Withautomaticappointment,youcreateaclusterofnodesandtheyelectoneofthemselvestobethemaster.Apartfromsimplerconfiguration,automaticappointmentmeansthattheclustercanautomaticallyappointanewmaster

whenamasterfails,reducingdowntime.Inordertogetreadresilience,youneedtoensurethatthereadandwritepathsintoyour

applicationaredifferent,sothatyoucanhandleafailureinthewritepathandstillread.Thisincludessuchthingsasputtingthereadsandwritesthroughseparatedatabaseconnections—afacilitythatisnotoftensupportedbydatabaseinteractionlibraries.Aswithanyfeature,youcannotbesureyouhavereadresiliencewithoutgoodteststhatdisablethewritesandcheckthatreadsstilloccur.Replicationcomeswithsomealluringbenefits,butitalsocomeswithaninevitabledarkside—

inconsistency.Youhavethedangerthatdifferentclients,readingdifferentslaves,willseedifferentvaluesbecausethechangeshaven’tallpropagatedtotheslaves.Intheworstcase,thatcanmeanthataclientcannotreadawriteitjustmade.Evenifyouusemaster-slavereplicationjustforhotbackupthiscanbeaconcern,becauseifthemasterfails,anyupdatesnotpassedontothebackuparelost.We’lltalkabouthowtodealwiththeseissueslater(“Consistency,”p.47).

4.4.Peer-to-PeerReplicationMaster-slavereplicationhelpswithreadscalabilitybutdoesn’thelpwithscalabilityofwrites.Itprovidesresilienceagainstfailureofaslave,butnotofamaster.Essentially,themasterisstillabottleneckandasinglepointoffailure.Peer-to-peerreplication(seeFigure4.3)attackstheseproblemsbynothavingamaster.Allthereplicashaveequalweight,theycanallacceptwrites,andthelossofanyofthemdoesn’tpreventaccesstothedatastore.

Figure4.3.Peer-to-peerreplicationhasallnodesapplyingreadsandwritestoallthedata.Theprospectherelooksmightyfine.Withapeer-to-peerreplicationcluster,youcanrideover

nodefailureswithoutlosingaccesstodata.Furthermore,youcaneasilyaddnodestoimproveyourperformance.There’smuchtolikehere—buttherearecomplications.Thebiggestcomplicationis,again,consistency.Whenyoucanwritetotwodifferentplaces,you

runtheriskthattwopeoplewillattempttoupdatethesamerecordatthesametime—awrite-writeconflict.Inconsistenciesonreadleadtoproblemsbutatleasttheyarerelativelytransient.Inconsistentwritesareforever.We’lltalkmoreabouthowtodealwithwriteinconsistencieslateron,butforthemomentwe’ll

noteacoupleofbroadoptions.Atoneend,wecanensurethatwheneverwewritedata,thereplicascoordinatetoensureweavoidaconflict.Thiscangiveusjustasstrongaguaranteeasamaster,albeitatthecostofnetworktraffictocoordinatethewrites.Wedon’tneedallthereplicastoagreeonthewrite,justamajority,sowecanstillsurvivelosingaminorityofthereplicanodes.Attheotherextreme,wecandecidetocopewithaninconsistentwrite.Therearecontextswhenwe

cancomeupwithpolicytomergeinconsistentwrites.Inthiscasewecangetthefullperformancebenefitofwritingtoanyreplica.Thesepointsareattheendsofaspectrumwherewetradeoffconsistencyforavailability.

4.5.CombiningShardingandReplicationReplicationandshardingarestrategiesthatcanbecombined.Ifweusebothmaster-slavereplicationandsharding(seeFigure4.4),thismeansthatwehavemultiplemasters,buteachdataitemonlyhasasinglemaster.Dependingonyourconfiguration,youmaychooseanodetobeamasterforsomedataandslavesforothers,oryoumaydedicatenodesformasterorslaveduties.

Figure4.4.Usingmaster-slavereplicationtogetherwithshardingUsingpeer-to-peerreplicationandshardingisacommonstrategyforcolumn-familydatabases.In

ascenariolikethisyoumighthavetensorhundredsofnodesinaclusterwithdatashardedoverthem.Agoodstartingpointforpeer-to-peerreplicationistohaveareplicationfactorof3,soeachshardispresentonthreenodes.Shouldanodefail,thentheshardsonthatnodewillbebuiltontheothernodes(seeFigure4.5).

Figure4.5.Usingpeer-to-peerreplicationtogetherwithsharding

4.6.KeyPoints•Therearetwostylesofdistributingdata:•Shardingdistributesdifferentdataacrossmultipleservers,soeachserveractsasthesinglesourceforasubsetofdata.

•Replicationcopiesdataacrossmultipleservers,soeachbitofdatacanbefoundinmultipleplaces.

Asystemmayuseeitherorbothtechniques.•Replicationcomesintwoforms:•Master-slavereplicationmakesonenodetheauthoritativecopythathandleswriteswhileslavessynchronizewiththemasterandmayhandlereads.

•Peer-to-peerreplicationallowswritestoanynode;thenodescoordinatetosynchronizetheircopiesofthedata.

Master-slavereplicationreducesthechanceofupdateconflictsbutpeer-to-peerreplicationavoidsloadingallwritesontoasinglepointoffailure.

Chapter5.Consistency

Oneofthebiggestchangesfromacentralizedrelationaldatabasetoacluster-orientedNoSQLdatabaseisinhowyouthinkaboutconsistency.Relationaldatabasestrytoexhibitstrongconsistencybyavoidingallthevariousinconsistenciesthatwe’llshortlybediscussing.OnceyoustartlookingattheNoSQLworld,phrasessuchas“CAPtheorem”and“eventualconsistency”appear,andassoonasyoustartbuildingsomethingyouhavetothinkaboutwhatsortofconsistencyyouneedforyoursystem.Consistencycomesinvariousforms,andthatonewordcoversamyriadofwayserrorscancreep

intoyourlife.Sowe’regoingtobeginbytalkingaboutthevariousshapesconsistencycantake.Afterthatwe’lldiscusswhyyoumaywanttorelaxconsistency(anditsbigsister,durability).

5.1.UpdateConsistencyWe’llbeginbyconsideringupdatingatelephonenumber.Coincidentally,MartinandPramodarelookingatthecompanywebsiteandnoticethatthephonenumberisoutofdate.Implausibly,theybothhaveupdateaccess,sotheybothgoinatthesametimetoupdatethenumber.Tomaketheexampleinteresting,we’llassumetheyupdateitslightlydifferently,becauseeachusesaslightlydifferentformat.Thisissueiscalledawrite-writeconflict:twopeopleupdatingthesamedataitematthesametime.Whenthewritesreachtheserver,theserverwillserializethem—decidetoapplyone,thenthe

other.Let’sassumeitusesalphabeticalorderandpicksMartin’supdatefirst,thenPramod’s.Withoutanyconcurrencycontrol,Martin’supdatewouldbeappliedandimmediatelyoverwrittenbyPramod’s.InthiscaseMartin’sisalostupdate.Herethelostupdateisnotabigproblem,butoftenitis.WeseethisasafailureofconsistencybecausePramod’supdatewasbasedonthestatebeforeMartin’supdate,yetwasappliedafterit.Approachesformaintainingconsistencyinthefaceofconcurrencyareoftendescribedas

pessimisticoroptimistic.Apessimisticapproachworksbypreventingconflictsfromoccurring;anoptimisticapproachletsconflictsoccur,butdetectsthemandtakesactiontosortthemout.Forupdateconflicts,themostcommonpessimisticapproachistohavewritelocks,sothatinordertochangeavalueyouneedtoacquirealock,andthesystemensuresthatonlyoneclientcangetalockatatime.SoMartinandPramodwouldbothattempttoacquirethewritelock,butonlyMartin(thefirstone)wouldsucceed.PramodwouldthenseetheresultofMartin’swritebeforedecidingwhethertomakehisownupdate.Acommonoptimisticapproachisaconditionalupdatewhereanyclientthatdoesanupdatetests

thevaluejustbeforeupdatingittoseeifit’schangedsincehislastread.Inthiscase,Martin’supdatewouldsucceedbutPramod’swouldfail.TheerrorwouldletPramodknowthatheshouldlookatthevalueagainanddecidewhethertoattemptafurtherupdate.Boththepessimisticandoptimisticapproachesthatwe’vejustdescribedrelyonaconsistent

serializationoftheupdates.Withasingleserver,thisisobvious—ithastochooseone,thentheother.Butifthere’smorethanoneserver,suchaswithpeer-to-peerreplication,thentwonodesmightapplytheupdatesinadifferentorder,resultinginadifferentvalueforthetelephonenumberoneachpeer.Often,whenpeopletalkaboutconcurrencyindistributedsystems,theytalkaboutsequentialconsistency—ensuringthatallnodesapplyoperationsinthesameorder.Thereisanotheroptimisticwaytohandleawrite-writeconflict—savebothupdatesandrecordthat

theyareinconflict.Thisapproachisfamiliartomanyprogrammersfromversioncontrolsystems,particularlydistributedversioncontrolsystemsthatbytheirnaturewilloftenhaveconflictingcommits.Thenextstepagainfollowsfromversioncontrol:Youhavetomergethetwoupdatessomehow.Maybeyoushowbothvaluestotheuserandaskthemtosortitout—thisiswhathappensifyouupdatethesamecontactonyourphoneandyourcomputer.Alternatively,thecomputermaybeabletoperformthemergeitself;ifitwasaphoneformattingissue,itmaybeabletorealizethatandapplythenewnumberwiththestandardformat.Anyautomatedmergeofwrite-writeconflictsishighlydomain-specificandneedstobeprogrammedforeachparticularcase.Often,whenpeoplefirstencountertheseissues,theirreactionistopreferpessimisticconcurrency

becausetheyaredeterminedtoavoidconflicts.Whileinsomecasesthisistherightanswer,thereisalwaysatradeoff.Concurrentprogramminginvolvesafundamentaltradeoffbetweensafety(avoidingerrorssuchasupdateconflicts)andliveness(respondingquicklytoclients).Pessimisticapproachesoftenseverelydegradetheresponsivenessofasystemtothedegreethatitbecomesunfitforitspurpose.Thisproblemismadeworsebythedangeroferrors—pessimisticconcurrencyoftenleadstodeadlocks,whicharehardtopreventanddebug.Replicationmakesitmuchmorelikelytorunintowrite-writeconflicts.Ifdifferentnodeshave

differentcopiesofsomedatawhichcanbeindependentlyupdated,thenyou’llgetconflictsunlessyoutakespecificmeasurestoavoidthem.Usingasinglenodeasthetargetforallwritesforsomedatamakesitmucheasiertomaintainupdateconsistency.Ofthedistributionmodelswediscussedearlier,allbutpeer-to-peerreplicationdothis.

5.2.ReadConsistencyHavingadatastorethatmaintainsupdateconsistencyisonething,butitdoesn’tguaranteethatreadersofthatdatastorewillalwaysgetconsistentresponsestotheirrequests.Let’simaginewehaveanorderwithlineitemsandashippingcharge.Theshippingchargeiscalculatedbasedonthelineitemsintheorder.Ifweaddalineitem,wethusalsoneedtorecalculateandupdatetheshippingcharge.Inarelationaldatabase,theshippingchargeandlineitemswillbeinseparatetables.ThedangerofinconsistencyisthatMartinaddsalineitemtohisorder,Pramodthenreadsthelineitemsandshippingcharge,andthenMartinupdatestheshippingcharge.Thisisaninconsistentreadorread-writeconflict:InFigure5.1PramodhasdoneareadinthemiddleofMartin’swrite.

Figure5.1.Aread-writeconflictinlogicalconsistencyWerefertothistypeofconsistencyaslogicalconsistency:ensuringthatdifferentdataitemsmake

sensetogether.Toavoidalogicallyinconsistentread-writeconflict,relationaldatabasessupportthenotionoftransactions.ProvidingMartinwrapshistwowritesinatransaction,thesystemguaranteesthatPramodwilleitherreadbothdataitemsbeforetheupdateorbothaftertheupdate.AcommonclaimwehearisthatNoSQLdatabasesdon’tsupporttransactionsandthuscan’tbe

consistent.Suchclaimismostlywrongbecauseitglossesoverlotsofimportantdetails.OurfirstclarificationisthatanystatementaboutlackoftransactionsusuallyonlyappliestosomeNoSQLdatabases,inparticulartheaggregate-orientedones.Incontrast,graphdatabasestendtosupportACIDtransactionsjustthesameasrelationaldatabases.Secondly,aggregate-orienteddatabasesdosupportatomicupdates,butonlywithinasingle

aggregate.Thismeansthatyouwillhavelogicalconsistencywithinanaggregatebutnotbetweenaggregates.Sointheexample,youcouldavoidrunningintothatinconsistencyiftheorder,thedeliverycharge,andthelineitemsareallpartofasingleorderaggregate.Ofcoursenotalldatacanbeputinthesameaggregate,soanyupdatethataffectsmultiple

aggregatesleavesopenatimewhenclientscouldperformaninconsistentread.Thelengthoftimeaninconsistencyispresentiscalledtheinconsistencywindow.ANoSQLsystemmayhaveaquiteshortinconsistencywindow:Asonedatapoint,Amazon’sdocumentationsaysthattheinconsistencywindowforitsSimpleDBserviceisusuallylessthanasecond.Thisexampleofalogicallyinconsistentreadistheclassicexamplethatyou’llseeinanybookthat

touchesdatabaseprogramming.Onceyouintroducereplication,however,yougetawholenewkindofinconsistency.Let’simaginethere’sonelasthotelroomforadesirableevent.Thehotelreservationsystemrunsonmanynodes.MartinandCindyareacoupleconsideringthisroom,buttheyarediscussingthisonthephonebecauseMartinisinLondonandCindyisinBoston.MeanwhilePramod,whoisinMumbai,goesandbooksthatlastroom.Thatupdatesthereplicatedroomavailability,buttheupdategetstoBostonquickerthanitgetstoLondon.WhenMartinandCindyfireuptheirbrowserstoseeiftheroomisavailable,CindyseesitbookedandMartinseesitfree.Thisisanotherinconsistentread—butit’sabreachofadifferentformofconsistencywecallreplicationconsistency:ensuringthatthesamedataitemhasthesamevaluewhenreadfromdifferentreplicas(seeFigure5.2).

Figure5.2.AnexampleofreplicationinconsistencyEventually,ofcourse,theupdateswillpropagatefully,andMartinwillseetheroomisfully

booked.Thereforethissituationisgenerallyreferredtoaseventuallyconsistent,meaningthatatanytimenodesmayhavereplicationinconsistenciesbut,iftherearenofurtherupdates,eventuallyallnodeswillbeupdatedtothesamevalue.Datathatisoutofdateisgenerallyreferredtoasstale,whichremindsusthatacacheisanotherformofreplication—essentiallyfollowingthemaster-slavedistributionmodel.Althoughreplicationconsistencyisindependentfromlogicalconsistency,replicationcan

exacerbatealogicalinconsistencybylengtheningitsinconsistencywindow.Twodifferentupdatesonthemastermaybeperformedinrapidsuccession,leavinganinconsistencywindowofmilliseconds.Butdelaysinnetworkingcouldmeanthatthesameinconsistencywindowlastsformuchlongeronaslave.Consistencyguaranteesaren’tsomethingthat’sglobaltoanapplication.Youcanusuallyspecifythe

levelofconsistencyyouwantwithindividualrequests.Thisallowsyoutouseweakconsistencymostofthetimewhenitisn’tanissue,butrequeststrongconsistencywhenitis.Thepresenceofaninconsistencywindowmeansthatdifferentpeoplewillseedifferentthingsatthe

sametime.IfMartinandCindyarelookingatroomswhileonatransatlanticcall,itcancauseconfusion.It’smorecommonforuserstoactindependently,andthenthisisnotaproblem.Butinconsistencywindowscanbeparticularlyproblematicwhenyougetinconsistencieswithyourself.Considertheexampleofpostingcommentsonablogentry.Fewpeoplearegoingtoworryaboutinconsistencywindowsofevenafewminuteswhilepeoplearetypingintheirlatestthoughts.Often,systemshandletheloadofsuchsitesbyrunningonaclusterandload-balancingincomingrequeststodifferentnodes.Thereinliesadanger:Youmaypostamessageusingonenode,thenrefreshyourbrowser,buttherefreshgoestoadifferentnodewhichhasn’treceivedyourpostyet—anditlookslikeyourpostwaslost.Insituationslikethis,youcantoleratereasonablylonginconsistencywindows,butyouneedread-

your-writesconsistencywhichmeansthat,onceyou’vemadeanupdate,you’reguaranteedtocontinueseeingthatupdate.Onewaytogetthisinanotherwiseeventuallyconsistentsystemistoprovidesessionconsistency:Withinauser ’ssessionthereisread-your-writesconsistency.Thisdoesmeanthattheusermaylosethatconsistencyshouldtheirsessionendforsomereasonorshouldtheuseraccessthesamesystemsimultaneouslyfromdifferentcomputers,butthesecasesarerelativelyrare.Thereareacoupleoftechniquestoprovidesessionconsistency.Acommonway,andoftenthe

easiestway,istohaveastickysession:asessionthat’stiedtoonenode(thisisalsocalledsessionaffinity).Astickysessionallowsyoutoensurethataslongasyoukeepread-your-writesconsistencyonanode,you’llgetitforsessionstoo.Thedownsideisthatstickysessionsreducetheabilityoftheloadbalancertodoitsjob.Anotherapproachforsessionconsistencyistouseversionstamps(“VersionStamps,”p.61)and

ensureeveryinteractionwiththedatastoreincludesthelatestversionstampseenbyasession.Theservernodemustthenensurethatithastheupdatesthatincludethatversionstampbeforerespondingtoarequest.Maintainingsessionconsistencywithstickysessionsandmaster-slavereplicationcanbeawkward

ifyouwanttoreadfromtheslavestoimprovereadperformancebutstillneedtowritetothemaster.Onewayofhandlingthisisforwritestobesenttheslave,whothentakesresponsibilityforforwardingthemtothemasterwhilemaintainingsessionconsistencyforitsclient.Anotherapproachistoswitchthesessiontothemastertemporarilywhendoingawrite,justlongenoughthatreadsaredonefromthemasteruntiltheslaveshavecaughtupwiththeupdate.We’retalkingaboutreplicationconsistencyinthecontextofadatastore,butit’salsoanimportant

factorinoverallapplicationdesign.Evenasimpledatabasesystemwillhavelotsofoccasionswheredataispresentedtoauser,theusercogitates,andthenupdatesthatdata.It’susuallyabadideatokeepatransactionopenduringuserinteractionbecausethere’sarealdangerofconflictswhentheusertriestomakeherupdate,whichleadstosuchapproachesasofflinelocks[FowlerPoEAA].

5.3.RelaxingConsistencyConsistencyisaGoodThing—but,sadly,sometimeswehavetosacrificeit.Itisalwayspossibletodesignasystemtoavoidinconsistencies,butoftenimpossibletodosowithoutmakingunbearablesacrificesinothercharacteristicsofthesystem.Asaresult,weoftenhavetotradeoffconsistencyforsomethingelse.Whilesomearchitectsseethisasadisaster,weseeitaspartoftheinevitabletradeoffsinvolvedinsystemdesign.Furthermore,differentdomainshavedifferenttolerancesforinconsistency,andweneedtotakethistoleranceintoaccountaswemakeourdecisions.Tradingoffconsistencyisafamiliarconcepteveninsingle-serverrelationaldatabasesystems.

Here,ourprincipaltooltoenforceconsistencyisthetransaction,andtransactionscanprovidestrongconsistencyguarantees.However,transactionsystemsusuallycomewiththeabilitytorelaxisolationlevels,allowingqueriestoreaddatathathasn’tbeencommittedyet,andinpracticeweseemostapplicationsrelaxconsistencydownfromthehighestisolationlevel(serialized)inordertogeteffectiveperformance.Wemostcommonlyseepeopleusingtheread-committedtransactionlevel,whicheliminatessomeread-writeconflictsbutallowsothers.Manysystemsforgotransactionsentirelybecausetheperformanceimpactoftransactionsistoo

high.We’veseenthisinacoupledifferentways.Onasmallscale,wesawthepopularityofMySQLduringthedayswhenitdidn’tsupporttransactions.ManywebsiteslikedthehighspeedofMySQLandwerepreparedtolivewithouttransactions.Attheotherendofthescale,someverylargewebsites,suchaseBay[Pritchett],havehadtoforgotransactionsinordertoperformacceptably—thisisparticularlytruewhenyouneedtointroducesharding.Evenwithouttheseconstraints,manyapplicationbuildersneedtointeractwithremotesystemsthatcan’tbeproperlyincludedwithinatransactionboundary,soupdatingoutsideoftransactionsisaquitecommonoccurrenceforenterpriseapplications.

5.3.1.TheCAPTheoremIntheNoSQLworldit’scommontorefertotheCAPtheoremasthereasonwhyyoumayneedtorelaxconsistency.ItwasoriginallyproposedbyEricBrewerin2000[Brewer]andgivenaformalproofbySethGilbertandNancyLynch[LynchandGilbert]acoupleofyearslater.(YoumayalsohearthisreferredtoasBrewer ’sConjecture.)ThebasicstatementoftheCAPtheoremisthat,giventhethreepropertiesofConsistency,

Availability,andPartitiontolerance,youcanonlygettwo.Obviouslythisdependsverymuchonhowyoudefinethesethreeproperties,anddifferingopinionshaveledtoseveraldebatesonwhattherealconsequencesoftheCAPtheoremare.Consistencyisprettymuchaswe’vedefineditsofar.Availabilityhasaparticularmeaninginthe

contextofCAP—itmeansthatifyoucantalktoanodeinthecluster,itcanreadandwritedata.That’ssubtlydifferentfromtheusualmeaning,whichwe’llexplorelater.Partitiontolerancemeansthattheclustercansurvivecommunicationbreakagesintheclusterthatseparatetheclusterintomultiplepartitionsunabletocommunicatewitheachother(situationknownasasplitbrain,seeFigure5.3).

Figure5.3.Withtwobreaksinthecommunicationlines,thenetworkpartitionsintotwogroups.Asingle-serversystemistheobviousexampleofaCAsystem—asystemthathasConsistencyand

AvailabilitybutnotPartitiontolerance.Asinglemachinecan’tpartition,soitdoesnothavetoworryaboutpartitiontolerance.There’sonlyonenode—soifit’sup,it’savailable.Beingupandkeepingconsistencyisreasonable.Thisistheworldthatmostrelationaldatabasesystemslivein.ItistheoreticallypossibletohaveaCAcluster.However,thiswouldmeanthatifapartitionever

occursinthecluster,allthenodesintheclusterwouldgodownsothatnoclientcantalktoanode.Bytheusualdefinitionof“available,”thiswouldmeanalackofavailability,butthisiswhereCAP’sspecialusageof“availability”getsconfusing.CAPdefines“availability”tomean“everyrequestreceivedbyanonfailingnodeinthesystemmustresultinaresponse”[LynchandGilbert].Soafailed,unresponsivenodedoesn’tinferalackofCAPavailability.ThisdoesimplythatyoucanbuildaCAcluster,butyouhavetoensureitwillonlypartitionrarely

andcompletely.Thiscanbedone,atleastwithinadatacenter,butit’susuallyprohibitivelyexpensive.Rememberthatinordertobringdownallthenodesinaclusteronapartition,youalsohavetodetectthepartitioninatimelymanner—whichitselfisnosmallfeat.Soclustershavetobetolerantofnetworkpartitions.AndhereistherealpointoftheCAPtheorem.

AlthoughtheCAPtheoremisoftenstatedas“youcanonlygettwooutofthree,”inpracticewhatit’ssayingisthatinasystemthatmaysufferpartitions,asdistributedsystemdo,youhavetotradeoffconsistencyversusavailability.Thisisn’tabinarydecision;often,youcantradeoffalittleconsistencytogetsomeavailability.Theresultingsystemwouldbeneitherperfectlyconsistentnorperfectlyavailable—butwouldhaveacombinationthatisreasonableforyourparticularneeds.Anexampleshouldhelpillustratethis.MartinandPramodarebothtryingtobookthelasthotel

roomonasystemthatusespeer-to-peerdistributionwithtwonodes(LondonforMartinandMumbaiforPramod).Ifwewanttoensureconsistency,thenwhenMartintriestobookhisroomontheLondonnode,thatnodemustcommunicatewiththeMumbainodebeforeconfirmingthebooking.Essentially,bothnodesmustagreeontheserializationoftheirrequests.Thisgivesusconsistency—butshouldthenetworklinkbreak,thenneithersystemcanbookanyhotelroom,sacrificingavailability.Onewaytoimproveavailabilityistodesignateonenodeasthemasterforaparticularhoteland

ensureallbookingsareprocessedbythatmaster.ShouldthatmasterbeMumbai,thenMumbaicanstillprocesshotelbookingsforthathotelandPramodwillgetthelastroom.Ifweusemaster-slavereplication,Londonuserscanseetheinconsistentroominformationbutcannotmakeabookingandthuscauseanupdateinconsistency.However,usersexpectthatitcouldhappeninthissituation—so,again,thecompromiseworksforthisparticularusecase.Thisimprovesthesituation,butwestillcan’tbookaroomontheLondonnodeforthehotelwhose

masterisinMumbaiiftheconnectiongoesdown.InCAPterminology,thisisafailureofavailabilityinthatMartincantalktotheLondonnodebuttheLondonnodecannotupdatethedata.Togainmoreavailability,wemightallowbothsystemstokeepacceptinghotelreservationsevenifthenetworklinkbreaksdown.ThedangerhereisthatMartinandPramodbookthelasthotelroom.However,dependingonhowthishoteloperates,thatmaybefine.Often,travelcompaniestolerateacertainamountofoverbookinginordertocopewithno-shows.Conversely,somehotelsalwayskeepafewroomsclearevenwhentheyarefullybooked,inordertobeabletoswapaguestoutofaroomwithproblemsortoaccommodateahigh-statuslatebooking.Somemightevencancelthebookingwithanapologyoncetheydetectedtheconflict—reasoningthatthecostofthatislessthanthecostoflosingbookingsonnetworkfailures.Theclassicexampleofallowinginconsistentwritesistheshoppingcart,asdiscussedinDynamo

[Amazon’sDynamo].Inthiscaseyouarealwaysallowedtowritetoyourshoppingcart,evenifnetworkfailuresmeanyouendupwithmultipleshoppingcarts.Thecheckoutprocesscanmergethetwoshoppingcartsbyputtingtheunionoftheitemsfromthecartsintoasinglecartandreturningthat.Almostalwaysthat’sthecorrectanswer—butifnot,theusergetstheopportunitytolookatthecartbeforecompletingtheorder.ThelessonhereisthatalthoughmostsoftwaredeveloperstreatupdateconsistencyasTheWay

ThingsMustBe,therearecaseswhereyoucandealgracefullywithinconsistentanswerstorequests.Thesesituationsarecloselytiedtothedomainandrequiredomainknowledgetoknowhowtoresolve.Thusyoucan’tusuallylooktosolvethempurelywithinthedevelopmentteam—youhavetotalktodomainexperts.Ifyoucanfindawaytohandleinconsistentupdates,thisgivesyoumoreoptionstoincreaseavailabilityandperformance.Forashoppingcart,itmeansthatshopperscanalwaysshop,anddosoquickly.AndasPatrioticAmericans,weknowhowvitalitistosupportOurRetailDestiny.Asimilarlogicappliestoreadconsistency.Ifyouaretradingfinancialinstrumentsovera

computerizedexchange,youmaynotbeabletotolerateanydatathatisn’trightuptodate.However,ifyouarepostinganewsitemtoamediawebsite,youmaybeabletotolerateoldpagesforminutes.Inthesecasesyouneedtoknowhowtolerantyouareofstalereads,andhowlongtheinconsistency

windowcanbe—oftenintermsoftheaveragelength,worstcase,andsomemeasureofthedistributionforthelengths.Differentdataitemsmayhavedifferenttolerancesforstaleness,andthusmayneeddifferentsettingsinyourreplicationconfiguration.AdvocatesofNoSQLoftensaythatinsteadoffollowingtheACIDpropertiesofrelational

transactions,NoSQLsystemsfollowtheBASEproperties(BasicallyAvailable,Softstate,Eventualconsistency)[Brewer].AlthoughwefeelweoughttomentiontheBASEacronymhere,wedon’tthinkit’sveryuseful.TheacronymisevenmorecontrivedthanACID,andneither“basicallyavailable”nor“softstate”havebeenwelldefined.WeshouldalsostressthatwhenBrewerintroducedthenotionofBASE,hesawthetradeoffbetweenACIDandBASEasaspectrum,notabinarychoice.We’veincludedthisdiscussionoftheCAPtheorembecauseit’softenused(andabused)when

talkingaboutthetradeoffsinvolvingconsistencyindistributeddatabases.However,it’susuallybettertothinknotaboutthetradeoffbetweenconsistencyandavailabilitybutratherbetweenconsistencyandlatency.Wecansummarizemuchofthediscussionaboutconsistencyindistributionbysayingthatwecanimproveconsistencybygettingmorenodesinvolvedintheinteraction,buteachnodeweaddincreasestheresponsetimeofthatinteraction.Wecanthenthinkofavailabilityasthelimitoflatencythatwe’repreparedtotolerate;oncelatencygetstoohigh,wegiveupandtreatthedataasunavailable—whichneatlyfitsitsdefinitioninthecontextofCAP.

5.4.RelaxingDurabilitySofarwe’vetalkedaboutconsistency,whichismostofwhatpeoplemeanwhentheytalkabouttheACIDpropertiesofdatabasetransactions.ThekeytoConsistencyisserializingrequestsbyformingAtomic,Isolatedworkunits.Butmostpeoplewouldscoffatrelaxingdurability—afterall,whatisthepointofadatastoreifitcanloseupdates?Asitturnsout,therearecaseswhereyoumaywanttotradeoffsomedurabilityforhigher

performance.Ifadatabasecanrunmostlyinmemory,applyupdatestoitsin-memoryrepresentation,andperiodicallyflushchangestodisk,thenitmaybeabletoprovidesubstantiallyhigherresponsivenesstorequests.Thecostisthat,shouldtheservercrash,anyupdatessincethelastflushwillbelost.Oneexampleofwherethistradeoffmaybeworthwhileisstoringuser-sessionstate.Abigwebsite

mayhavemanyusersandkeeptemporaryinformationaboutwhateachuserisdoinginsomekindofsessionstate.There’salotofactivityonthisstate,creatinglotsofdemand,whichaffectstheresponsivenessofthewebsite.Thevitalpointisthatlosingthesessiondataisn’ttoomuchofatragedy—itwillcreatesomeannoyance,butmaybelessthanaslowerwebsitewouldcause.Thismakesitagoodcandidatefornondurablewrites.Often,youcanspecifythedurabilityneedsonacall-by-callbasis,sothatmoreimportantupdatescanforceaflushtodisk.Anotherexampleofrelaxingdurabilityiscapturingtelemetricdatafromphysicaldevices.Itmay

bethatyou’drathercapturedataatafasterrate,atthecostofmissingthelastupdatesshouldtheservergodown.Anotherclassofdurabilitytradeoffscomesupwithreplicateddata.Afailureofreplication

durabilityoccurswhenanodeprocessesanupdatebutfailsbeforethatupdateisreplicatedtotheothernodes.Asimplecaseofthismayhappenifyouhaveamaster-slavedistributionmodelwheretheslavesappointanewmasterautomaticallyshouldtheexistingmasterfail.Ifamasterdoesfail,anywritesnotpassedontothereplicaswilleffectivelybecomelost.Shouldthemastercomebackonline,thoseupdateswillconflictwithupdatesthathavehappenedsince.Wethinkofthisasadurabilityproblembecauseyouthinkyourupdatehassucceededsincethemasteracknowledgedit,butamasternodefailurecausedittobelost.Ifyou’resufficientlyconfidentinbringingthemasterbackonlinerapidly,thisisareasonnotto

auto-failovertoaslave.Otherwise,youcanimprovereplicationdurabilitybyensuringthatthemasterwaitsforsomereplicastoacknowledgetheupdatebeforethemasteracknowledgesittotheclient.

Obviously,however,thatwillslowdownupdatesandmaketheclusterunavailableifslavesfail—so,again,wehaveatradeoff,dependinguponhowvitaldurabilityis.Aswithbasicdurability,it’susefulforindividualcallstoindicatewhatlevelofdurabilitytheyneed.

5.5.QuorumsWhenyou’retradingoffconsistencyordurability,it’snotanallornothingproposition.Themorenodesyouinvolveinarequest,thehigheristhechanceofavoidinganinconsistency.Thisnaturallyleadstothequestion:Howmanynodesneedtobeinvolvedtogetstrongconsistency?Imaginesomedatareplicatedoverthreenodes.Youdon’tneedallnodestoacknowledgeawriteto

ensurestrongconsistency;allyouneedistwoofthem—amajority.Ifyouhaveconflictingwrites,onlyonecangetamajority.ThisisreferredtoasawritequorumandexpressedinaslightlypretentiousinequalityofW>N/2,meaningthenumberofnodesparticipatinginthewrite(W)mustbemorethanthehalfthenumberofnodesinvolvedinreplication(N).Thenumberofreplicasisoftencalledthereplicationfactor.Similarlytothewritequorum,thereisthenotionofreadquorum:Howmanynodesyouneedto

contacttobesureyouhavethemostup-to-datechange.Thereadquorumisabitmorecomplicatedbecauseitdependsonhowmanynodesneedtoconfirmawrite.Let’sconsiderareplicationfactorof3.Ifallwritesneedtwonodestoconfirm(W=2)thenwe

needtocontactatleasttwonodestobesurewe’llgetthelatestdata.If,however,writesareonlyconfirmedbyasinglenode(W=1)weneedtotalktoallthreenodestobesurewehavethelatestupdates.Inthiscase,sincewedon’thaveawritequorum,wemayhaveanupdateconflict,butbycontactingenoughreaderswecanbesuretodetectit.Thuswecangetstronglyconsistentreadsevenifwedon’thavestrongconsistencyonourwrites.Thisrelationshipbetweenthenumberofnodesyouneedtocontactforaread(R),thoseconfirming

awrite(W),andthereplicationfactor(N)canbecapturedinaninequality:YoucanhaveastronglyconsistentreadifR+W>N.Theseinequalitiesarewrittenwithapeer-to-peerdistributionmodelinmind.Ifyouhaveamaster-

slavedistribution,youonlyhavetowritetothemastertoavoidwrite-writeconflicts,andsimilarlyonlyreadfromthemastertoavoidread-writeconflicts.Withthisnotation,itiscommontoconfusethenumberofnodesintheclusterwiththereplicationfactor,buttheseareoftendifferent.Imayhave100nodesinmycluster,butonlyhaveareplicationfactorof3,withmostofthedistributionoccurringduetosharding.Indeedmostauthoritiessuggestthatareplicationfactorof3isenoughtohavegoodresilience.

Thisallowsasinglenodetofailwhilestillmaintainingquoraforreadsandwrites.Ifyouhaveautomaticrebalancing,itwon’ttaketoolongfortheclustertocreateathirdreplica,sothechancesoflosingasecondreplicabeforeareplacementcomesupareslight.Thenumberofnodesparticipatinginanoperationcanvarywiththeoperation.Whenwriting,we

mightrequirequorumforsometypesofupdatesbutnotothers,dependingonhowmuchwevalueconsistencyandavailability.Similarly,areadthatneedsspeedbutcantoleratestalenessshouldcontactlessnodes.Oftenyoumayneedtotakebothintoaccount.Ifyouneedfast,stronglyconsistentreads,youcould

requirewritestobeacknowledgedbyallthenodes,thusallowingreadstocontactonlyone(N=3,W=3,R=1).Thatwouldmeanthatyourwritesareslow,sincetheyhavetocontactallthreenodes,andyouwouldnotbeabletotoleratelosinganode.Butinsomecircumstancesthatmaybethetradeofftomake.

Thepointtoallofthisisthatyouhavearangeofoptionstoworkwithandcanchoosewhichcombinationofproblemsandadvantagestoprefer.SomewritersonNoSQLtalkaboutasimpletradeoffbetweenconsistencyandavailability;wehopeyounowrealizethatit’smoreflexible—andmorecomplicated—thanthat.

5.6.FurtherReadingThereareallsortsofinterestingblogpostsandpapersontheInternetaboutconsistencyindistributedsystems,butthemosthelpfulsourceforuswas[TanenbaumandVanSteen].Itdoesanexcellentjoboforganizingmuchofthefundamentalsofdistributedsystemsandisthebestplacetogoifyou’dliketodelvedeeperthanwehaveinthischapter.Aswewerefinishingthisbook,IEEEComputerhadaspecialissue[IEEEComputerFeb2012]on

thegrowinginfluenceoftheCAPtheorem,whichisahelpfulsourceoffurtherclarificationforthistopic.

5.7.KeyPoints•Write-writeconflictsoccurwhentwoclientstrytowritethesamedataatthesametime.Read-writeconflictsoccurwhenoneclientreadsinconsistentdatainthemiddleofanotherclient’swrite.

•Pessimisticapproacheslockdatarecordstopreventconflicts.Optimisticapproachesdetectconflictsandfixthem.

•Distributedsystemsseeread-writeconflictsduetosomenodeshavingreceivedupdateswhileothernodeshavenot.Eventualconsistencymeansthatatsomepointthesystemwillbecomeconsistentonceallthewriteshavepropagatedtoallthenodes.

•Clientsusuallywantread-your-writesconsistency,whichmeansaclientcanwriteandthenimmediatelyreadthenewvalue.Thiscanbedifficultifthereadandthewritehappenondifferentnodes.

•Togetgoodconsistency,youneedtoinvolvemanynodesindataoperations,butthisincreaseslatency.Soyouoftenhavetotradeoffconsistencyversuslatency.

•TheCAPtheoremstatesthatifyougetanetworkpartition,youhavetotradeoffavailabilityofdataversusconsistency.

•Durabilitycanalsobetradedoffagainstlatency,particularlyifyouwanttosurvivefailureswithreplicateddata.

•Youdonotneedtocontactallreplicantstopreservestrongconsistencywithreplication;youjustneedalargeenoughquorum.

Chapter6.VersionStamps

ManycriticsofNoSQLdatabasesfocusonthelackofsupportfortransactions.Transactionsareausefultoolthathelpsprogrammerssupportconsistency.OnereasonwhymanyNoSQLproponentsworrylessaboutalackoftransactionsisthataggregate-orientedNoSQLdatabasesdosupportatomicupdateswithinanaggregate—andaggregatesaredesignedsothattheirdataformsanaturalunitofupdate.Thatsaid,it’struethattransactionalneedsaresomethingtotakeintoaccountwhenyoudecidewhatdatabasetouse.Aspartofthis,it’simportanttorememberthattransactionshavelimitations.Evenwithina

transactionalsystemwestillhavetodealwithupdatesthatrequirehumaninterventionandusuallycannotberunwithintransactionsbecausetheywouldinvolveholdingatransactionopenfortoolong.Wecancopewiththeseusingversionstamps—whichturnouttobehandyinothersituationsaswell,particularlyaswemoveawayfromthesingle-serverdistributionmodel.

6.1.BusinessandSystemTransactionsTheneedtosupportupdateconsistencywithouttransactionsisactuallyacommonfeatureofsystemsevenwhentheyarebuiltontopoftransactionaldatabases.Whenusersthinkabouttransactions,theyusuallymeanbusinesstransactions.Abusinesstransactionmaybesomethinglikebrowsingaproductcatalog,choosingabottleofTaliskeratagoodprice,fillingincreditcardinformation,andconfirmingtheorder.Yetallofthisusuallywon’toccurwithinthesystemtransactionprovidedbythedatabasebecausethiswouldmeanlockingthedatabaseelementswhiletheuseristryingtofindtheircreditcardandgetscalledofftolunchbytheircolleagues.Usuallyapplicationsonlybeginasystemtransactionattheendoftheinteractionwiththeuser,so

thatthelocksareonlyheldforashortperiodoftime.Theproblem,however,isthatcalculationsanddecisionsmayhavebeenmadebasedondatathat’schanged.ThepricelistmayhaveupdatedthepriceoftheTalisker,orsomeonemayhaveupdatedthecustomer ’saddress,changingtheshippingcharges.Thebroadtechniquesforhandlingthisareofflineconcurrency[FowlerPoEAA],usefulinNoSQL

situationstoo.AparticularlyusefulapproachistheOptimisticOfflineLock[FowlerPoEAA],aformofconditionalupdatewhereaclientoperationrereadsanyinformationthatthebusinesstransactionreliesonandchecksthatithasn’tchangedsinceitwasoriginallyreadanddisplayedtotheuser.Agoodwayofdoingthisistoensurethatrecordsinthedatabasecontainsomeformofversionstamp:afieldthatchangeseverytimetheunderlyingdataintherecordchanges.Whenyoureadthedatayoukeepanoteoftheversionstamp,sothatwhenyouwritedatayoucanchecktoseeiftheversionhaschanged.YoumayhavecomeacrossthistechniquewithupdatingresourceswithHTTP[HTTP].Onewayof

doingthisistouseetags.Wheneveryougetaresource,theserverrespondswithanetagintheheader.Thisetagisanopaquestringthatindicatestheversionoftheresource.Ifyouthenupdatethatresource,youcanuseaconditionalupdatebysupplyingtheetagthatyougotfromyourlastGET.Iftheresourcehaschangedontheserver,theetagswon’tmatchandtheserverwillrefusetheupdate,returninga412(PreconditionFailed)response.Somedatabasesprovideasimilarmechanismofconditionalupdatethatallowsyoutoensure

updateswon’tbebasedonstaledata.Youcandothischeckyourself,althoughyouthenhavetoensurenootherthreadcanrunagainsttheresourcebetweenyourreadandyourupdate.(Sometimesthisiscalledacompare-and-set(CAS)operation,whosenamecomesfromtheCASoperationsdonein

processors.ThedifferenceisthataprocessorCAScomparesavaluebeforesettingit,whileadatabaseconditionalupdatecomparesaversionstampofthevalue.)Therearevariouswaysyoucanconstructyourversionstamps.Youcanuseacounter,always

incrementingitwhenyouupdatetheresource.Countersareusefulsincetheymakeiteasytotellifoneversionismorerecentthananother.Ontheotherhand,theyrequiretheservertogeneratethecountervalue,andalsoneedasinglemastertoensurethecountersaren’tduplicated.AnotherapproachistocreateaGUID,alargerandomnumberthat’sguaranteedtobeunique.

Theseusesomecombinationofdates,hardwareinformation,andwhateverothersourcesofrandomnesstheycanpickup.ThenicethingaboutGUIDsisthattheycanbegeneratedbyanyoneandyou’llnevergetaduplicate;adisadvantageisthattheyarelargeandcan’tbecompareddirectlyforrecentness.Athirdapproachistomakeahashofthecontentsoftheresource.Withabigenoughhashkeysize,

acontenthashcanbegloballyuniquelikeaGUIDandcanalsobegeneratedbyanyone;theadvantageisthattheyaredeterministic—anynodewillgeneratethesamecontenthashforsameresourcedata.However,likeGUIDstheycan’tbedirectlycomparedforrecentness,andtheycanbelengthy.Afourthapproachistousethetimestampofthelastupdate.Likecounters,theyarereasonably

shortandcanbedirectlycomparedforrecentness,yethavetheadvantageofnotneedingasinglemaster.Multiplemachinescangeneratetimestamps—buttoworkproperly,theirclockshavetobekeptinsync.Onenodewithabadclockcancauseallsortsofdatacorruptions.There’salsoadangerthatifthetimestampistoogranularyoucangetduplicates—it’snogoodusingtimestampsofamillisecondprecisionifyougetmanyupdatespermillisecond.Youcanblendtheadvantagesofthesedifferentversionstampschemesbyusingmorethanoneof

themtocreateacompositestamp.Forexample,CouchDBusesacombinationofcounterandcontenthash.Mostofthetimethisallowsversionstampstobecomparedforrecentness,evenwhenyouusepeer-to-peerreplication.Shouldtwopeersupdateatthesametime,thecombinationofthesamecountanddifferentcontenthashesmakesiteasytospottheconflict.Aswellashelpingtoavoidupdateconflicts,versionstampsarealsousefulforprovidingsession

consistency(p.52).

6.2.VersionStampsonMultipleNodesThebasicversionstampworkswellwhenyouhaveasingleauthoritativesourcefordata,suchasasingleserverormaster-slavereplication.Inthatcasetheversionstampiscontrolledbythemaster.Anyslavesfollowthemaster ’sstamps.Butthissystemhastobeenhancedinapeer-to-peerdistributionmodelbecausethere’snolongerasingleplacetosettheversionstamps.Ifyou’reaskingtwonodesforsomedata,yourunintothechancethattheymaygiveyoudifferent

answers.Ifthishappens,yourreactionmayvarydependingonthecauseofthatdifference.Itmaybethatanupdatehasonlyreachedonenodebutnottheother,inwhichcaseyoucanacceptthelatest(assumingyoucantellwhichonethatis).Alternatively,youmayhaverunintoaninconsistentupdate,inwhichcaseyouneedtodecidehowtodealwiththat.Inthissituation,asimpleGUIDoretagwon’tsuffice,sincethesedon’ttellyouenoughabouttherelationships.Thesimplestformofversionstampisacounter.Eachtimeanodeupdatesthedata,itincrements

thecounterandputsthevalueofthecounterintotheversionstamp.Ifyouhaveblueandgreenslavereplicasofasinglemaster,andthebluenodeanswerswithaversionstampof4andthegreennodewith6,youknowthatthegreen’sanswerismorerecent.Inmultiple-mastercases,weneedsomethingfancier.Oneapproach,usedbydistributedversion

controlsystems,istoensurethatallnodescontainahistoryofversionstamps.Thatwayyoucanseeifthebluenode’sanswerisanancestorofthegreen’sanswer.Thiswouldeitherrequiretheclientstoholdontoversionstamphistories,ortheservernodestokeepversionstamphistoriesandincludethemwhenaskedfordata.Thisalsodetectsaninconsistency,whichwewouldseeifwegettwoversionstampsandneitherofthemhastheotherintheirhistories.Althoughversioncontrolsystemskeepthesekindsofhistories,theyaren’tfoundinNoSQLdatabases.Asimplebutproblematicapproachistousetimestamps.Themainproblemhereisthatit’susually

difficulttoensurethatallthenodeshaveaconsistentnotionoftime,particularlyifupdatescanhappenrapidly.Shouldanode’sclockgetoutofsync,itcancauseallsortsoftrouble.Inaddition,youcan’tdetectwrite-writeconflictswithtimestamps,soitwouldonlyworkwellforthesingle-mastercase—andthenacounterisusuallybetter.Themostcommonapproachusedbypeer-to-peerNoSQLsystemsisaspecialformofversion

stampwhichwecallavectorstamp.Inessence,avectorstampisasetofcounters,oneforeachnode.Avectorstampforthreenodes(blue,green,black)wouldlooksomethinglike[blue:43,green:54,black:12].Eachtimeanodehasaninternalupdate,itupdatesitsowncounter,soanupdateinthegreennodewouldchangethevectorto[blue:43,green:55,black:12].Whenevertwonodescommunicate,theysynchronizetheirvectorstamps.Thereareseveralvariationsofexactlyhowthissynchronizationisdone.We’recoiningtheterm“vectorstamp”asageneralterminthisbook;you’llalsocomeacrossvectorclocksandversionvectors—thesearespecificformsofvectorstampsthatdifferinhowtheysynchronize.Byusingthisschemeyoucantellifoneversionstampisnewerthananotherbecausethenewer

stampwillhaveallitscountersgreaterthanorequaltothoseintheolderstamp.So[blue:1,green:2,black:5]isnewerthan[blue:1,green:1,black5]sinceoneofitscountersisgreater.Ifbothstampshaveacountergreaterthantheother,e.g.[blue:1,green:2,black:5]and[blue:2,green:1,black:5],thenyouhaveawrite-writeconflict.Theremaybemissingvaluesinthevector,inwhichcaseweusetreatthemissingvalueas0.So

[blue:6,black:2]wouldbetreatedas[blue:6,green:0,black:2].Thisallowsyoutoeasilyaddnewnodeswithoutinvalidatingtheexistingvectorstamps.Vectorstampsareavaluabletoolthatspotsinconsistencies,butdoesn’tresolvethem.Anyconflict

resolutionwilldependonthedomainyouareworkingin.Thisispartoftheconsistency/latencytradeoff.Youeitherhavetolivewiththefactthatnetworkpartitionsmaymakeyoursystemunavailable,oryouhavetodetectanddealwithinconsistencies.

6.3.KeyPoints•Versionstampshelpyoudetectconcurrencyconflicts.Whenyoureaddata,thenupdateit,youcanchecktheversionstamptoensurenobodyupdatedthedatabetweenyourreadandwrite.

•Versionstampscanbeimplementedusingcounters,GUIDs,contenthashes,timestamps,oracombinationofthese.

•Withdistributedsystems,avectorofversionstampsallowsyoutodetectwhendifferentnodeshaveconflictingupdates.

Chapter7.Map-Reduce

Theriseofaggregate-orienteddatabasesisinlargepartduetothegrowthofclusters.Runningonaclustermeansyouhavetomakeyourtradeoffsindatastoragedifferentlythanwhenrunningonasinglemachine.Clustersdon’tjustchangetherulesfordatastorage—theyalsochangetherulesforcomputation.Ifyoustorelotsofdataonacluster,processingthatdataefficientlymeansyouhavetothinkdifferentlyabouthowyouorganizeyourprocessing.Withacentralizeddatabase,therearegenerallytwowaysyoucanruntheprocessinglogicagainst

it:eitheronthedatabaseserveritselforonaclientmachine.Runningitonaclientmachinegivesyoumoreflexibilityinchoosingaprogrammingenvironment,whichusuallymakesforprogramsthatareeasiertocreateorextend.Thiscomesatthecostofhavingtoshleplotsofdatafromthedatabaseserver.Ifyouneedtohitalotofdata,thenitmakessensetodotheprocessingontheserver,payingthepriceinprogrammingconvenienceandincreasingtheloadonthedatabaseserver.Whenyouhaveacluster,thereisgoodnewsimmediately—youhavelotsofmachinestospreadthe

computationover.However,youalsostillneedtotrytoreducetheamountofdatathatneedstobetransferredacrossthenetworkbydoingasmuchprocessingasyoucanonthesamenodeasthedataitneeds.Themap-reducepattern(aformofScatter-Gather[HohpeandWoolf])isawaytoorganize

processinginsuchawayastotakeadvantageofmultiplemachinesonaclusterwhilekeepingasmuchprocessingandthedataitneedstogetheronthesamemachine.ItfirstgainedprominencewithGoogle’sMapReduceframework[DeanandGhemawat].Awidelyusedopen-sourceimplementationispartoftheHadoopproject,althoughseveraldatabasesincludetheirownimplementations.Aswithmostpatterns,therearedifferencesindetailbetweentheseimplementations,sowe’llconcentrateonthegeneralconcept.Thename“map-reduce”revealsitsinspirationfromthemapandreduceoperationsoncollectionsinfunctionalprogramminglanguages.

7.1.BasicMap-ReduceToexplainthebasicidea,we’llstartfromanexamplewe’vealreadyfloggedtodeath—thatofcustomersandorders.Let’sassumewehavechosenordersasouraggregate,witheachorderhavinglineitems.EachlineitemhasaproductID,quantity,andthepricecharged.Thisaggregatemakesalotofsenseasusuallypeoplewanttoseethewholeorderinoneaccess.Wehavelotsoforders,sowe’veshardedthedatasetovermanymachines.However,salesanalysispeoplewanttoseeaproductanditstotalrevenueforthelastsevendays.

Thisreportdoesn’tfittheaggregatestructurethatwehave—whichisthedownsideofusingaggregates.Inordertogettheproductrevenuereport,you’llhavetovisiteverymachineintheclusterandexaminemanyrecordsoneachmachine.Thisisexactlythekindofsituationthatcallsformap-reduce.Thefirststageinamap-reducejobis

themap.Amapisafunctionwhoseinputisasingleaggregateandwhoseoutputisabunchofkey-valuepairs.Inthiscase,theinputwouldbeanorder.Theoutputwouldbekey-valuepairscorrespondingtothelineitems.EachonewouldhavetheproductIDasthekeyandanembeddedmapwiththequantityandpriceasthevalues(seeFigure7.1).

Figure7.1.Amapfunctionreadsrecordsfromthedatabaseandemitskey-valuepairs.Eachapplicationofthemapfunctionisindependentofalltheothers.Thisallowsthemtobesafely

parallelizable,sothatamap-reduceframeworkcancreateefficientmaptasksoneachnodeandfreelyallocateeachordertoamaptask.Thisyieldsagreatdealofparallelismandlocalityofdataaccess.Forthisexample,wearejustselectingavalueoutoftherecord,butthere’snoreasonwhywecan’tcarryoutsomearbitrarilycomplexfunctionaspartofthemap—providingitonlydependsononeaggregate’sworthofdata.Amapoperationonlyoperatesonasinglerecord;thereducefunctiontakesmultiplemapoutputs

withthesamekeyandcombinestheirvalues.So,amapfunctionmightyield1000lineitemsfromordersfor“DatabaseRefactoring”;thereducefunctionwouldreducedowntoone,withthetotalsforthequantityandrevenue.Whilethemapfunctionislimitedtoworkingonlyondatafromasingleaggregate,thereducefunctioncanuseallvaluesemittedforasinglekey(seeFigure7.2).

Figure7.2.Areducefunctiontakesseveralkey-valuepairswiththesamekeyandaggregatesthemintoone.

Themap-reduceframeworkarrangesformaptaskstoberunonthecorrectnodestoprocessallthedocumentsandfordatatobemovedtothereducefunction.Tomakeiteasiertowritethereducefunction,theframeworkcollectsallthevaluesforasinglepairandcallsthereducefunctiononcewiththekeyandthecollectionofallthevaluesforthatkey.Sotorunamap-reducejob,youjustneedtowritethesetwofunctions.

7.2.PartitioningandCombiningInthesimplestform,wethinkofamap-reducejobashavingasinglereducefunction.Theoutputsfromallthemaptasksrunningonthevariousnodesareconcatenatedtogetherandsentintothereduce.Whilethiswillwork,therearethingswecandotoincreasetheparallelismandtoreducethedatatransfer(seeFigure7.3).

Figure7.3.Partitioningallowsreducefunctionstoruninparallelondifferentkeys.Thefirstthingwecandoisincreaseparallelismbypartitioningtheoutputofthemappers.Each

reducefunctionoperatesontheresultsofasinglekey.Thisisalimitation—itmeansyoucan’tdoanythinginthereducethatoperatesacrosskeys—butit’salsoabenefitinthatitallowsyoutorunmultiplereducersinparallel.Totakeadvantageofthis,theresultsofthemapperaredividedupbasedthekeyoneachprocessingnode.Typically,multiplekeysaregroupedtogetherintopartitions.Theframeworkthentakesthedatafromallthenodesforonepartition,combinesitintoasinglegroupforthatpartition,andsendsitofftoareducer.Multiplereducerscanthenoperateonthepartitionsinparallel,withthefinalresultsmergedtogether.(Thisstepisalsocalled“shuffling,”andthepartitionsaresometimesreferredtoas“buckets”or“regions.”)Thenextproblemwecandealwithistheamountofdatabeingmovedfromnodetonodebetween

themapandreducestages.Muchofthisdataisrepetitive,consistingofmultiplekey-valuepairsforthesamekey.Acombinerfunctioncutsthisdatadownbycombiningallthedataforthesamekeyintoasinglevalue(seeFigure7.4).Acombinerfunctionis,inessence,areducerfunction—indeed,inmanycasesthesamefunctioncanbeusedforcombiningasthefinalreduction.Thereducefunctionneedsaspecialshapeforthistowork:Itsoutputmustmatchitsinput.Wecallsuchafunctionacombinablereducer.

Figure7.4.Combiningreducesdatabeforesendingitacrossthenetwork.Notallreducefunctionsarecombinable.Considerafunctionthatcountsthenumberofunique

customersforaparticularproduct.Themapfunctionforsuchanoperationwouldneedtoemittheproductandthecustomer.Thereducercanthencombinethemandcounthowmanytimeseachcustomerappearsforaparticularproduct,emittingtheproductandthecount(seeFigure7.5).Butthisreducer ’soutputisdifferentfromitsinput,soitcan’tbeusedasacombiner.Youcanstillrunacombiningfunctionhere:onethatjusteliminatesduplicateproduct-customerpairs,butitwillbedifferentfromthefinalreducer.

Figure7.5.Thisreducefunction,whichcountshowmanyuniquecustomersorderaparticulartea,isnotcombinable.

Whenyouhavecombiningreducers,themap-reduceframeworkcansafelyrunnotonlyinparallel(toreducedifferentpartitions),butalsoinseriestoreducethesamepartitionatdifferenttimesandplaces.Inadditiontoallowingcombiningtooccuronanodebeforedatatransmission,youcanalsostartcombiningbeforemappershavefinished.Thisprovidesagoodbitofextraflexibilitytothemap-reduceprocessing.Somemap-reduceframeworksrequireallreducerstobecombiningreducers,whichmaximizesthisflexibility.Ifyouneedtodoanoncombiningreducerwithoneoftheseframeworks,you’llneedtoseparatetheprocessingintopipelinedmap-reducesteps.

7.3.ComposingMap-ReduceCalculationsThemap-reduceapproachisawayofthinkingaboutconcurrentprocessingthattradesoffflexibilityinhowyoustructureyourcomputationforarelativelystraightforwardmodelforparallelizingthecomputationoveracluster.Sinceit’satradeoff,thereareconstraintsonwhatyoucandoinyour

calculations.Withinamaptask,youcanonlyoperateonasingleaggregate.Withinareducetask,youcanonlyoperateonasinglekey.Thismeansyouhavetothinkdifferentlyaboutstructuringyourprogramssotheyworkwellwithintheseconstraints.Onesimplelimitationisthatyouhavetostructureyourcalculationsaroundoperationsthatfitin

wellwiththenotionofareduceoperation.Agoodexampleofthisiscalculatingaverages.Let’sconsiderthekindoforderswe’vebeenlookingatsofar;supposewewanttoknowtheaverageorderedquantityofeachproduct.Animportantpropertyofaveragesisthattheyarenotcomposable—thatis,ifItaketwogroupsoforders,Ican’tcombinetheiraveragesalone.Instead,Ineedtotaketotalamountandthecountofordersfromeachgroup,combinethose,andthencalculatetheaveragefromthecombinedsumandcount(seeFigure7.6).

Figure7.6.Whencalculatingaverages,thesumandcountcanbecombinedinthereducecalculation,buttheaveragemustbecalculatedfromthecombinedsumandcount.

Thisnotionoflookingforcalculationsthatreduceneatlyalsoaffectshowwedocounts.Tomakeacount,themappingfunctionwillemitcountfieldswithavalueof1,whichcanbesummedtogetatotalcount(seeFigure7.7).

Figure7.7.Whenmakingacount,eachmapemits1,whichcanbesummedtogetatotal.

7.3.1.ATwoStageMap-ReduceExampleAsmap-reducecalculationsgetmorecomplex,it’susefultobreakthemdownintostagesusinga

pipes-and-filtersapproach,withtheoutputofonestageservingasinputtothenext,ratherlikethepipelinesinUNIX.Consideranexamplewherewewanttocomparethesalesofproductsforeachmonthin2011tothe

prioryear.Todothis,we’llbreakthecalculationsdownintotwostages.Thefirststagewillproducerecordsshowingtheaggregatefiguresforasingleproductinasinglemonthoftheyear.Thesecondstagethenusestheseasinputsandproducestheresultforasingleproductbycomparingonemonth’sresultswiththesamemonthintheprioryear(seeFigure7.8).

Figure7.8.Acalculationbrokendownintotwomap-reducesteps,whichwillbeexpandedinthenextthreefigures

Afirststage(Figure7.9)wouldreadtheoriginalorderrecordsandoutputaseriesofkey-valuepairsforthesalesofeachproductpermonth.

Figure7.9.CreatingrecordsformonthlysalesofaproductThisstageissimilartothemap-reduceexampleswe’veseensofar.Theonlynewfeatureisusinga

compositekeysothatwecanreducerecordsbasedonthevaluesofmultiplefields.Thesecond-stagemappers(Figure7.10)processthisoutputdependingontheyear.A2011record

populatesthecurrentyearquantitywhilea2010recordpopulatesaprioryearquantity.Recordsforearlieryears(suchas2009)don’tresultinanymappingoutputbeingemitted.

Figure7.10.Thesecondstagemappercreatesbaserecordsforyear-on-yearcomparisons.Thereduceinthiscase(Figure7.11)isamergeofrecords,wherecombiningthevaluesby

summingallowstwodifferentyearoutputstobereducedtoasinglevalue(withacalculationbasedonthereducedvaluesthrowninforgoodmeasure).

Figure7.11.Thereductionstepisamergeofincompleterecords.Decomposingthisreportintomultiplemap-reducestepsmakesiteasiertowrite.Likemany

transformationexamples,onceyou’vefoundatransformationframeworkthatmakesiteasytocomposesteps,it’susuallyeasiertocomposemanysmallstepstogetherthantrytocramheapsoflogicintoasinglestep.Anotheradvantageisthattheintermediateoutputmaybeusefulfordifferentoutputstoo,soyou

cangetsomereuse.Thisreuseisimportantasitsavestimebothinprogrammingandinexecution.Theintermediaterecordscanbesavedinthedatastore,formingamaterializedview(“MaterializedViews,”p.30).Earlystagesofmap-reduceoperationsareparticularlyvaluabletosavesincetheyoftenrepresenttheheaviestamountofdataaccess,sobuildingthemonceasabasisformanydownstreamusessavesalotofwork.Aswithanyreuseactivity,however,it’simportanttobuildthemoutofexperiencewithrealqueries,asspeculativereuserarelyfulfillsitspromise.Soit’simportanttolookattheformsofvariousqueriesastheyarebuiltandfactoroutthecommonpartsofthecalculationsintomaterializedviews.Map-reduceisapatternthatcanbeimplementedinanyprogramminglanguage.However,the

constraintsofthestylemakeitagoodfitforlanguagesspecificallydesignedformap-reducecomputations.ApachePig[Pig],anoffshootoftheHadoop[Hadoop]project,isalanguagespecificallybuilttomakeiteasytowritemap-reduceprograms.ItcertainlymakesitmucheasiertoworkwithHadoopthantheunderlyingJavalibraries.Inasimilarvein,ifyouwanttospecifymap-reduceprogramsusinganSQL-likesyntax,thereishive[Hive],anotherHadoopoffshoot.Themap-reducepatternisimportanttoknowaboutevenoutsideofthecontextofNoSQL

databases.Google’soriginalmap-reducesystemoperatedonfilesstoredonadistributedfilesystem—anapproachthat’susedbytheopen-sourceHadoopproject.Whileittakessomethoughttogetusedtotheconstraintsofstructuringcomputationsinmap-reducesteps,theresultisacalculationthatisinherentlywell-suitedtorunningonacluster.Whendealingwithhighvolumesofdata,youneedtotakeacluster-orientedapproach.Aggregate-orienteddatabasesfitwellwiththisstyleofcalculation.Wethinkthatinthenextfewyearsmanymoreorganizationswillbeprocessingthevolumesofdata

thatdemandacluster-orientedsolution—andthemap-reducepatternwillseemoreandmoreuse.

7.3.2.IncrementalMap-ReduceTheexampleswe’vediscussedsofararecompletemap-reducecomputations,wherewestartwithrawinputsandcreateafinaloutput.Manymap-reducecomputationstakeawhiletoperform,evenwithclusteredhardware,andnewdatakeepscominginwhichmeansweneedtorerunthecomputationtokeeptheoutputuptodate.Startingfromscratcheachtimecantaketoolong,sooftenit’susefultostructureamap-reducecomputationtoallowincrementalupdates,sothatonlytheminimumcomputationneedstobedone.Themapstagesofamap-reduceareeasytohandleincrementally—onlyiftheinputdatachanges

doesthemapperneedtobererun.Sincemapsareisolatedfromeachother,incrementalupdatesarestraightforward.Themorecomplexcaseisthereducestep,sinceitpullstogethertheoutputsfrommanymapsand

anychangeinthemapoutputscouldtriggeranewreduction.Thisrecomputationcanbelesseneddependingonhowparallelthereducestepis.Ifwearepartitioningthedataforreduction,thenanypartitionthat’sunchangeddoesnotneedtobere-reduced.Similarly,ifthere’sacombinerstep,itdoesn’tneedtobererunifitssourcedatahasn’tchanged.Ifourreduceriscombinable,there’ssomemoreopportunitiesforcomputationavoidance.Ifthe

changesareadditive—thatis,ifweareonlyaddingnewrecordsbutarenotchangingordeletinganyoldrecords—thenwecanjustrunthereducewiththeexistingresultandthenewadditions.Iftherearedestructivechanges,thatisupdatesanddeletes,thenwecanavoidsomerecomputationbybreakingupthereduceoperationintostepsandonlyrecalculatingthosestepswhoseinputshavechanged—essentially,usingaDependencyNetwork[FowlerDSL]toorganizethecomputation.Themap-reduceframeworkcontrolsmuchofthis,soyouhavetounderstandhowaspecific

frameworksupportsincrementaloperation.

7.4.FurtherReadingIfyou’regoingtousemap-reducecalculations,yourfirstportofcallwillbethedocumentationfortheparticulardatabaseyouareusing.Eachdatabasehasitsownapproach,vocabulary,andquirks,andthat’swhatyou’llneedtobefamiliarwith.Beyondthat,thereisaneedtocapturemoregeneralinformationonhowtostructuremap-reducejobstomaximizemaintainabilityandperformance.Wedon’thaveanyspecificbookstopointtoyet,butwesuspectthatagoodthougheasilyoverlookedsourcearebooksonHadoop.AlthoughHadoopisnotadatabase,it’satoolthatusesmap-reduceheavily,sowritinganeffectivemap-reducetaskwithHadoopislikelytobeusefulinothercontexts(subjecttothechangesindetailbetweenHadoopandwhateversystemsyou’reusing).

7.5.KeyPoints•Map-reduceisapatterntoallowcomputationstobeparallelizedoveracluster.•Themaptaskreadsdatafromanaggregateandboilsitdowntorelevantkey-valuepairs.Mapsonlyreadasinglerecordatatimeandcanthusbeparallelizedandrunonthenodethatstorestherecord.

•Reducetaskstakemanyvaluesforasinglekeyoutputfrommaptasksandsummarizethemintoasingleoutput.Eachreduceroperatesontheresultofasinglekey,soitcanbeparallelizedbykey.

•Reducersthathavethesameformforinputandoutputcanbecombinedintopipelines.This

improvesparallelismandreducestheamountofdatatobetransferred.•Map-reduceoperationscanbecomposedintopipelineswheretheoutputofonereduceistheinputtoanotheroperation’smap.

•Iftheresultofamap-reducecomputationiswidelyused,itcanbestoredasamaterializedview.•Materializedviewscanbeupdatedthroughincrementalmap-reduceoperationsthatonlycomputechangestotheviewinsteadofrecomputingeverythingfromscratch.

PartII:Implement

Chapter8.Key-ValueDatabases

Akey-valuestoreisasimplehashtable,primarilyusedwhenallaccesstothedatabaseisviaprimarykey.ThinkofatableinatraditionalRDBMSwithtwocolumns,suchasIDandNAME,theIDcolumnbeingthekeyandNAMEcolumnstoringthevalue.InanRDBMS,theNAMEcolumnisrestrictedtostoringdataoftypeString.TheapplicationcanprovideanIDandVALUEandpersistthepair;iftheIDalreadyexiststhecurrentvalueisoverwritten,otherwiseanewentryiscreated.Let’slookathowterminologycomparesinOracleandRiak.

8.1.WhatIsaKey-ValueStoreKey-valuestoresarethesimplestNoSQLdatastorestousefromanAPIperspective.Theclientcaneithergetthevalueforthekey,putavalueforakey,ordeleteakeyfromthedatastore.Thevalueisablobthatthedatastorejuststores,withoutcaringorknowingwhat’sinside;it’stheresponsibilityoftheapplicationtounderstandwhatwasstored.Sincekey-valuestoresalwaysuseprimary-keyaccess,theygenerallyhavegreatperformanceandcanbeeasilyscaled.Someofthepopularkey-valuedatabasesareRiak[Riak],Redis(oftenreferredtoasDataStructure

server)[Redis],MemcachedDBanditsflavors[Memcached],BerkeleyDB[BerkeleyDB],HamsterDB(especiallysuitedforembeddeduse)[HamsterDB],AmazonDynamoDB[Amazon’sDynamo](notopen-source),andProjectVoldemort[ProjectVoldemort](anopen-sourceimplementationofAmazonDynamoDB).Insomekey-valuestores,suchasRedis,theaggregatebeingstoreddoesnothavetobeadomain

object—itcouldbeanydatastructure.Redissupportsstoringlists,sets,hashesandcandorange,diff,union,andintersectionoperations.ThesefeaturesallowRedistobeusedinmoredifferentwaysthanastandardkey-valuestore.Therearemanymorekey-valuedatabasesandmanynewonesarebeingworkedonatthistime.

ForthesakeofkeepingdiscussionsinthisbookeasierwewillfocusmostlyonRiak.Riakletsusstorekeysintobuckets,whicharejustawaytosegmentthekeys—thinkofbucketsasflatnamespacesforthekeys.Ifwewantedtostoreusersessiondata,shoppingcartinformation,anduserpreferencesinRiak,we

couldjuststorealloftheminthesamebucketwithasinglekeyandsinglevalueforalloftheseobjects.Inthisscenario,wewouldhaveasingleobjectthatstoresallthedataandisputintoasinglebucket(Figure8.1).

Figure8.1.StoringallthedatainasinglebucketThedownsideofstoringallthedifferentobjects(aggregates)inthesinglebucketwouldbethat

onebucketwouldstoredifferenttypesofaggregates,increasingthechanceofkeyconflicts.Analternateapproachwouldbetoappendthenameoftheobjecttothekey,suchas288790b8a421_userProfile,sothatwecangettoindividualobjectsastheyareneeded(Figure8.2).

Figure8.2.Changethekeydesigntosegmentthedatainasinglebucket.Wecouldalsocreatebucketswhichstorespecificdata.InRiak,theyareknownasdomainbuckets

allowingtheserializationanddeserializationtobehandledbytheclientdriver.Clickheretoviewcodeimage

Bucketbucket=client.fetchBucket(bucketName).execute();DomainBucket<UserProfile>profileBucket=DomainBucket.builder(bucket,UserProfile.class).build();

Usingdomainbucketsordifferentbucketsfordifferentobjects(suchasUserProfileandShoppingCart)segmentsthedataacrossdifferentbucketsallowingyoutoreadonlytheobjectyouneedwithouthavingtochangekeydesign.Key-valuestoressuchasRedisalsosupportstoringrandomdatastructures,whichcanbesets,

hashes,strings,andsoon.Thisfeaturecanbeusedtostorelistsofthings,likestatesoraddressTypes,oranarrayofuser ’svisits.

8.2.Key-ValueStoreFeatures

WhileusinganyNoSQLdatastores,thereisaninevitableneedtounderstandhowthefeaturescomparetothestandardRDBMSdatastoresthatwearesousedto.Theprimaryreasonistounderstandwhatfeaturesaremissingandhowdoestheapplicationarchitectureneedtochangetobetterusethefeaturesofakey-valuedatastore.SomeofthefeatureswewilldiscussforalltheNoSQLdatastoresareconsistency,transactions,queryfeatures,structureofthedata,andscaling.

8.2.1.ConsistencyConsistencyisapplicableonlyforoperationsonasinglekey,sincetheseoperationsareeitheraget,put,ordeleteonasinglekey.Optimisticwritescanbeperformed,butareveryexpensivetoimplement,becauseachangeinvaluecannotbedeterminedbythedatastore.Indistributedkey-valuestoreimplementationslikeRiak,theeventuallyconsistent(p.50)modelof

consistencyisimplemented.Sincethevaluemayhavealreadybeenreplicatedtoothernodes,Riakhastwowaysofresolvingupdateconflicts:eitherthenewestwritewinsandolderwritesloose,orboth(all)valuesarereturnedallowingtheclienttoresolvetheconflict.InRiak,theseoptionscanbesetupduringthebucketcreation.Bucketsarejustawaytonamespace

keyssothatkeycollisionscanbereduced—forexample,allcustomerkeysmayresideinthecustomerbucket.Whencreatingabucket,defaultvaluesforconsistencycanbeprovided,forexamplethatawriteisconsideredgoodonlywhenthedataisconsistentacrossallthenodeswherethedataisstored.Clickheretoviewcodeimage

Bucketbucket=connection.createBucket(bucketName).withRetrier(attempts(3)).allowSiblings(siblingsAllowed).nVal(numberOfReplicasOfTheData).w(numberOfNodesToRespondToWrite).r(numberOfNodesToRespondToRead).execute();

Ifweneeddataineverynodetobeconsistent,wecanincreasethenumberOfNodesToRespondToWritesetbywtobethesameasnVal.Ofcoursedoingthatwilldecreasethewriteperformanceofthecluster.Toimproveonwriteorreadconflicts,wecanchangetheallowSiblingsflagduringbucketcreation:Ifitissettofalse,weletthelastwritetowinandnotcreatesiblings.

8.2.2.TransactionsDifferentproductsofthekey-valuestorekindhavedifferentspecificationsoftransactions.Generallyspeaking,therearenoguaranteesonthewrites.Manydatastoresdoimplementtransactionsindifferentways.Riakusestheconceptofquorum(“Quorums,”p.57)implementedbyusingtheWvalue—replicationfactor—duringthewriteAPIcall.AssumewehaveaRiakclusterwithareplicationfactorof5andwesupplytheWvalueof3.When

writing,thewriteisreportedassuccessfulonlywhenitiswrittenandreportedasasuccessonatleastthreeofthenodes.ThisallowsRiaktohavewritetolerance;inourexample,withNequalto5andwithaWvalueof3,theclustercantolerateN-W=2nodesbeingdownforwriteoperations,thoughwewouldstillhavelostsomedataonthosenodesforread.

8.2.3.QueryFeaturesAllkey-valuestorescanquerybythekey—andthat’saboutit.Ifyouhaverequirementstoqueryby

usingsomeattributeofthevaluecolumn,it’snotpossibletousethedatabase:Yourapplicationneedstoreadthevaluetofigureoutiftheattributemeetstheconditions.Querybykeyalsohasaninterestingsideeffect.Whatifwedon’tknowthekey,especiallyduring

ad-hocqueryingduringdebugging?Mostofthedatastoreswillnotgiveyoualistofalltheprimarykeys;eveniftheydid,retrievinglistsofkeysandthenqueryingforthevaluewouldbeverycumbersome.Somekey-valuedatabasesgetaroundthisbyprovidingtheabilitytosearchinsidethevalue,suchasRiakSearchthatallowsyoutoquerythedatajustlikeyouwouldqueryitusingLuceneindexes.Whileusingkey-valuestores,lotsofthoughthastobegiventothedesignofthekey.Canthekey

begeneratedusingsomealgorithm?Canthekeybeprovidedbytheuser(userID,email,etc.)?Orderivedfromtimestampsorotherdatathatcanbederivedoutsideofthedatabase?Thesequerycharacteristicsmakekey-valuestoreslikelycandidatesforstoringsessiondata(with

thesessionIDasthekey),shoppingcartdata,userprofiles,andsoon.Theexpiry_secspropertycanbeusedtoexpirekeysafteracertaintimeinterval,especiallyforsession/shoppingcartobjects.Clickheretoviewcodeimage

Bucketbucket=getBucket(bucketName);IRiakObjectriakObject=bucket.store(key,value).execute();

WhenwritingtotheRiakbucketusingthestoreAPI,theobjectisstoredforthekeyprovided.Similarly,wecangetthevaluestoredforthekeyusingthefetchAPI.Clickheretoviewcodeimage

Bucketbucket=getBucket(bucketName);IRiakObjectriakObject=bucket.fetch(key).execute();byte[]bytes=riakObject.getValue();Stringvalue=newString(bytes);

RiakprovidesanHTTP-basedinterface,sothatalloperationscanbeperformedfromthewebbrowseroronthecommandlineusingcurl.Let’ssavethisdatatoRiak:Clickheretoviewcodeimage

{"lastVisit":1324669989288,"user":{"customerId":"91cfdf5bcb7c","name":"buyer","countryCode":"US","tzOffset":0}}

UsethecurlcommandtoPOSTthedata,storingthedatainthesessionbucketwiththekeyofa7e618d9db25(wehavetoprovidethiskey):Clickheretoviewcodeimage

curl-v-XPOST-d'{"lastVisit":1324669989288,"user":{"customerId":"91cfdf5bcb7c","name":"buyer","countryCode":"US","tzOffset":0}}'-H"Content-Type:application/json"http://localhost:8098/buckets/session/keys/a7e618d9db25

Thedataforthekeya7e618d9db25canbefetchedbyusingthecurlcommand:Clickheretoviewcodeimage

curl-ihttp://localhost:8098/buckets/session/keys/a7e618d9db25

8.2.4.StructureofDataKey-valuedatabasesdon’tcarewhatisstoredinthevaluepartofthekey-valuepair.Thevaluecanbeablob,text,JSON,XML,andsoon.InRiak,wecanusetheContent-TypeinthePOSTrequesttospecifythedatatype.

8.2.5.ScalingManykey-valuestoresscalebyusingsharding(“Sharding,”p.38).Withsharding,thevalueofthekeydeterminesonwhichnodethekeyisstored.Let’sassumeweareshardingbythefirstcharacterofthekey;ifthekeyisf4b19d79587d,whichstartswithanf,itwillbesenttodifferentnodethanthekeyad9c7a396542.Thiskindofshardingsetupcanincreaseperformanceasmorenodesareaddedtothecluster.Shardingalsointroducessomeproblems.Ifthenodeusedtostorefgoesdown,thedatastoredon

thatnodebecomesunavailable,norcannewdatabewrittenwithkeysthatstartwithf.DatastoressuchasRiakallowyoutocontroltheaspectsoftheCAPTheorem(“TheCAP

Theorem,”p.53):N(numberofnodestostorethekey-valuereplicas),R(numberofnodesthathavetohavethedatabeingfetchedbeforethereadisconsideredsuccessful),andW(thenumberofnodesthewritehastobewrittentobeforeitisconsideredsuccessful).Let’sassumewehavea5-nodeRiakcluster.SettingNto3meansthatalldataisreplicatedtoatleast

threenodes,settingRto2meansanytwonodesmustreplytoaGETrequestforittobeconsideredsuccessful,andsettingWto2ensuresthatthePUTrequestiswrittentotwonodesbeforethewriteisconsideredsuccessful.Thesesettingsallowustofine-tunenodefailuresforreadorwriteoperations.Basedonourneed,

wecanchangethesevaluesforbetterreadavailabilityorwriteavailability.GenerallyspeakingchooseaWvaluetomatchyourconsistencyneeds;thesevaluescanbesetasdefaultsduringbucketcreation.

8.3.SuitableUseCasesLet’sdiscusssomeoftheproblemswherekey-valuestoresareagoodfit.

8.3.1.StoringSessionInformationGenerally,everywebsessionisuniqueandisassignedauniquesessionidvalue.ApplicationsthatstorethesessionidondiskorinanRDBMSwillgreatlybenefitfrommovingtoakey-valuestore,sinceeverythingaboutthesessioncanbestoredbyasinglePUTrequestorretrievedusingGET.Thissingle-requestoperationmakesitveryfast,aseverythingaboutthesessionisstoredinasingleobject.SolutionssuchasMemcachedareusedbymanywebapplications,andRiakcanbeusedwhenavailabilityisimportant.

8.3.2.UserProfiles,PreferencesAlmosteveryuserhasauniqueuserId,username,orsomeotherattribute,aswellaspreferencessuchaslanguage,color,timezone,whichproductstheuserhasaccessto,andsoon.Thiscanallbeputintoanobject,sogettingpreferencesofausertakesasingleGEToperation.Similarly,productprofilescanbestored.

8.3.3.ShoppingCartDataE-commercewebsiteshaveshoppingcartstiedtotheuser.Aswewanttheshoppingcartstobeavailableallthetime,acrossbrowsers,machines,andsessions,alltheshoppinginformationcanbeputintothevaluewherethekeyistheuserid.ARiakclusterwouldbebestsuitedforthesekindsofapplications.

8.4.WhenNottoUseThereareproblemspaceswherekey-valuestoresarenotthebestsolution.

8.4.1.RelationshipsamongDataIfyouneedtohaverelationshipsbetweendifferentsetsofdata,orcorrelatethedatabetweendifferentsetsofkeys,key-valuestoresarenotthebestsolutiontouse,eventhoughsomekey-valuestoresprovidelink-walkingfeatures.

8.4.2.MultioperationTransactionsIfyou’resavingmultiplekeysandthereisafailuretosaveanyoneofthem,andyouwanttorevertorrollbacktherestoftheoperations,key-valuestoresarenotthebestsolutiontobeused.

8.4.3.QuerybyDataIfyouneedtosearchthekeysbasedonsomethingfoundinthevaluepartofthekey-valuepairs,thenkey-valuestoresarenotgoingtoperformwellforyou.Thereisnowaytoinspectthevalueonthedatabaseside,withtheexceptionofsomeproductslikeRiakSearchorindexingengineslikeLucene[Lucene]orSolr[Solr].

8.4.4.OperationsbySetsSinceoperationsarelimitedtoonekeyatatime,thereisnowaytooperateuponmultiplekeysatthesametime.Ifyouneedtooperateuponmultiplekeys,youhavetohandlethisfromtheclientside.

Chapter9.DocumentDatabases

Documentsarethemainconceptindocumentdatabases.Thedatabasestoresandretrievesdocuments,whichcanbeXML,JSON,BSON,andsoon.Thesedocumentsareself-describing,hierarchicaltreedatastructureswhichcanconsistofmaps,collections,andscalarvalues.Thedocumentsstoredaresimilartoeachotherbutdonothavetobeexactlythesame.Documentdatabasesstoredocumentsinthevaluepartofthekey-valuestore;thinkaboutdocumentdatabasesaskey-valuestoreswherethevalueisexaminable.Let’slookathowterminologycomparesinOracleandMongoDB.

The_idisaspecialfieldthatisfoundonalldocumentsinMongo,justlikeROWIDinOracle.InMongoDB,_idcanbeassignedbytheuser,aslongasitisunique.

9.1.WhatIsaDocumentDatabase?Clickheretoviewcodeimage

{"firstname":"Martin","likes":["Biking","Photography"],"lastcity":"Boston","lastVisited":}

TheabovedocumentcanbeconsideredarowinatraditionalRDBMS.Let’slookatanotherdocument:Clickheretoviewcodeimage

{"firstname":"Pramod","citiesvisited":["Chicago","London","Pune","Bangalore"],"addresses":[{"state":"AK","city":"DILLINGHAM","type":"R"},{"state":"MH","city":"PUNE","type":"R"}],"lastcity":"Chicago"}

Lookingatthedocuments,wecanseethattheyaresimilar,buthavedifferencesinattributenames.Thisisallowedindocumentdatabases.Theschemaofthedatacandifferacrossdocuments,butthesedocumentscanstillbelongtothesamecollection—unlikeanRDBMSwhereeveryrowinatablehastofollowthesameschema.Werepresentalistofcitiesvisitedasanarray,oralistofaddressesaslistofdocumentsembeddedinsidethemaindocument.Embeddingchilddocumentsassubobjectsinsidedocumentsprovidesforeasyaccessandbetterperformance.Ifyoulookatthedocuments,youwillseethatsomeoftheattributesaresimilar,suchasfirstname

orcity.Atthesametime,thereareattributesintheseconddocumentwhichdonotexistinthefirstdocument,suchasaddresses,whilelikesisinthefirstdocumentbutnotthesecond.ThisdifferentrepresentationofdataisnotthesameasinRDBMSwhereeverycolumnhastobe

defined,andifitdoesnothavedataitismarkedasemptyorsettonull.Indocuments,therearenoemptyattributes;ifagivenattributeisnotfound,weassumethatitwasnotsetornotrelevanttothedocument.Documentsallowfornewattributestobecreatedwithouttheneedtodefinethemortochangetheexistingdocuments.SomeofthepopulardocumentdatabaseswehaveseenareMongoDB[MongoDB],CouchDB

[CouchDB],Terrastore[Terrastore],OrientDB[OrientDB],RavenDB[RavenDB],andofcoursethewell-knownandoftenreviledLotusNotes[NotesStorageFacility]thatusesdocumentstorage.

9.2.FeaturesWhiletherearemanyspecializeddocumentdatabases,wewilluseMongoDBasarepresentativeofthefeatureset.Keepinmindthateachproducthassomefeaturesthatmaynotbefoundinotherdocumentdatabases.Let’stakesometimetounderstandhowMongoDBworks.EachMongoDBinstancehasmultiple

databases,andeachdatabasecanhavemultiplecollections.WhenwecomparethiswithRDBMS,anRDBMSinstanceisthesameasMongoDBinstance,theschemasinRDBMSaresimilartoMongoDBdatabases,andtheRDBMStablesarecollectionsinMongoDB.Whenwestoreadocument,wehavetochoosewhichdatabaseandcollectionthisdocumentbelongsin—forexample,database.collection.insert(document),whichisusuallyrepresentedasdb.coll.insert(document).

9.2.1.ConsistencyConsistencyinMongoDBdatabaseisconfiguredbyusingthereplicasetsandchoosingtowaitforthewritestobereplicatedtoalltheslavesoragivennumberofslaves.Everywritecanspecifythenumberofserversthewritehastobepropagatedtobeforeitreturnsassuccessful.Acommandlikedb.runCommand({getlasterror:1,w:"majority"})tellsthedatabase

howstrongistheconsistencyyouwant.Forexample,ifyouhaveoneserverandspecifythewasmajority,thewritewillreturnimmediatelysincethereisonlyonenode.Ifyouhavethreenodesinthereplicasetandspecifywasmajority,thewritewillhavetocompleteataminimumoftwonodesbeforeitisreportedasasuccess.Youcanincreasethewvalueforstrongerconsistencybutyouwillsufferonwriteperformance,sincenowthewriteshavetocompleteatmorenodes.ReplicasetsalsoallowyoutoincreasethereadperformancebyallowingreadingfromslavesbysettingslaveOk;thisparametercanbesetontheconnection,ordatabase,orcollection,orindividuallyforeachoperation.Clickheretoviewcodeimage

Mongomongo=newMongo("localhost:27017");mongo.slaveOk();

HerewearesettingslaveOkperoperation,sothatwecandecidewhichoperationscanworkwithdatafromtheslavenode.Clickheretoviewcodeimage

DBCollectioncollection=getOrderCollection();BasicDBObjectquery=newBasicDBObject();query.put("name","Martin");DBCursorcursor=collection.find(query).slaveOk();

Similartovariousoptionsavailableforread,youcanchangethesettingstoachievestrongwriteconsistency,ifdesired.Bydefault,awriteisreportedsuccessfuloncethedatabasereceivesit;youcanchangethissoastowaitforthewritestobesyncedtodiskortopropagatetotwoormoreslaves.ThisisknownasWriteConcern:YoumakesurethatcertainwritesarewrittentothemasterandsomeslavesbysettingWriteConcerntoREPLICAS_SAFE.ShownbelowiscodewherewearesettingtheWriteConcernforallwritestoacollection:Clickheretoviewcodeimage

DBCollectionshopping=database.getCollection("shopping");shopping.setWriteConcern(REPLICAS_SAFE);

WriteConcerncanalsobesetperoperationbyspecifyingitonthesavecommand:Clickheretoviewcodeimage

WriteResultresult=shopping.insert(order,REPLICAS_SAFE);

Thereisatradeoffthatyouneedtocarefullythinkabout,basedonyourapplicationneedsandbusinessrequirements,todecidewhatsettingsmakesenseforslaveOkduringreadorwhatsafetylevelyoudesireduringwritewithWriteConcern.

9.2.2.TransactionsTransactions,inthetraditionalRDBMSsense,meanthatyoucanstartmodifyingthedatabasewithinsert,update,ordeletecommandsoverdifferenttablesandthendecideifyouwanttokeepthechangesornotbyusingcommitorrollback.TheseconstructsaregenerallynotavailableinNoSQLsolutions—awriteeithersucceedsorfails.Transactionsatthesingle-documentlevelareknownasatomictransactions.Transactionsinvolvingmorethanoneoperationarenotpossible,althoughthereareproductssuchasRavenDBthatdosupporttransactionsacrossmultipleoperations.Bydefault,allwritesarereportedassuccessful.Afinercontroloverthewritecanbeachievedby

usingWriteConcernparameter.Weensurethatorderiswrittentomorethanonenodebeforeit’sreportedsuccessfulbyusingWriteConcern.REPLICAS_SAFE.DifferentlevelsofWriteConcernletyouchoosethesafetylevelduringwrites;forexample,whenwritinglogentries,youcanuselowestlevelofsafety,WriteConcern.NONE.Clickheretoviewcodeimage

finalMongomongo=newMongo(mongoURI);mongo.setWriteConcern(REPLICAS_SAFE);DBCollectionshopping=mongo.getDB(orderDatabase).getCollection(shoppingCollection);try{WriteResultresult=shopping.insert(order,REPLICAS_SAFE);//Writesmadeittoprimaryandatleastonesecondary}catch(MongoExceptionwriteException){//WritesdidnotmakeittominimumoftwonodesincludingprimarydealWithWriteFailure(order,writeException);}

9.2.3.AvailabilityTheCAPtheorem(“TheCAPTheorem,”p.53)dictatesthatwecanhaveonlytwoofConsistency,Availability,andPartitionTolerance.Documentdatabasestrytoimproveonavailabilitybyreplicatingdatausingthemaster-slavesetup.Thesamedataisavailableonmultiplenodesandtheclientscangettothedataevenwhentheprimarynodeisdown.Usually,theapplicationcodedoesnothavetodetermineiftheprimarynodeisavailableornot.MongoDBimplementsreplication,providinghighavailabilityusingreplicasets.Inareplicaset,therearetwoormorenodesparticipatinginanasynchronousmaster-slave

replication.Thereplica-setnodeselectthemaster,orprimary,amongthemselves.Assumingallthenodeshaveequalvotingrights,somenodescanbefavoredforbeingclosertotheotherservers,forhavingmoreRAM,andsoon;userscanaffectthisbyassigningapriority—anumberbetween0and1000—toanode.Allrequestsgotothemasternode,andthedataisreplicatedtotheslavenodes.Ifthemasternode

goesdown,theremainingnodesinthereplicasetvoteamongthemselvestoelectanewmaster;allfuturerequestsareroutedtothenewmaster,andtheslavenodesstartgettingdatafromthenewmaster.Whenthenodethatfailedcomesbackonline,itjoinsinasaslaveandcatchesupwiththerestofthenodesbypullingallthedataitneedstogetcurrent.Figure9.1isanexampleconfigurationofreplicasets.Wehavetwonodes,mongoAandmongoB,

runningtheMongoDBdatabaseintheprimarydata-center,andmongoCinthesecondarydatacenter.Ifwewantnodesintheprimarydatacentertobeelectedasprimarynodes,wecanassignthemahigherprioritythantheothernodes.Morenodescanbeaddedtothereplicasetswithouthavingtotakethemoffline.

Figure9.1.Replicasetconfigurationwithhigherpriorityassignedtonodesinthesamedatacenter

Theapplicationwritesorreadsfromtheprimary(master)node.Whenconnectionisestablished,theapplicationonlyneedstoconnecttoonenode(primaryornot,doesnotmatter)inthereplicaset,andtherestofthenodesarediscoveredautomatically.Whentheprimarynodegoesdown,thedrivertalkstothenewprimaryelectedbythereplicaset.Theapplicationdoesnothavetomanageanyofthecommunicationfailuresornodeselectioncriteria.Usingreplicasetsgivesyoutheabilitytohaveahighlyavailabledocumentdatastore.

Replicasetsaregenerallyusedfordataredundancy,automatedfailover,readscaling,servermaintenancewithoutdowntime,anddisasterrecovery.SimilaravailabilitysetupscanbeachievedwithCouchDB,RavenDB,Terrastore,andotherproducts.

9.2.4.QueryFeaturesDocumentdatabasesprovidedifferentqueryfeatures.CouchDBallowsyoutoqueryviaviews—complexqueriesondocumentswhichcanbeeithermaterialized(“MaterializedViews,”p.30)ordynamic(thinkofthemasRDBMSviewswhichareeithermaterializedornot).WithCouchDB,ifyouneedtoaggregatethenumberofreviewsforaproductaswellastheaveragerating,youcouldaddaviewimplementedviamap-reduce(“BasicMap-Reduce,”p.68)toreturnthecountofreviewsandtheaverageoftheirratings.Whentherearemanyrequests,youdon’twanttocomputethecountandaverageforeveryrequest;

insteadyoucanaddamaterializedviewthatprecomputesthevaluesandstorestheresultsinthedatabase.Thesematerializedviewsareupdatedwhenqueried,ifanydatawaschangedsincethelastupdate.Oneofthegoodfeaturesofdocumentdatabases,ascomparedtokey-valuestores,isthatwecan

querythedatainsidethedocumentwithouthavingtoretrievethewholedocumentbyitskeyandthenintrospectthedocument.ThisfeaturebringsthesedatabasesclosertotheRDBMSquerymodel.MongoDBhasaquerylanguagewhichisexpressedviaJSONandhasconstructssuchas$query

forthewhereclause,$orderbyforsortingthedata,or$explaintoshowtheexecutionplanofthequery.TherearemanymoreconstructslikethesethatcanbecombinedtocreateaMongoDBquery.Let’slookatcertainqueriesthatwecandoagainstMongoDB.Supposewewanttoreturnallthe

documentsinanordercollection(allrowsintheordertable).TheSQLforthiswouldbe:SELECT*FROMorder

TheequivalentqueryinMongoshellwouldbe:db.order.find()

SelectingtheordersforasinglecustomerIdof883c2c5b4e5bwouldbe:Clickheretoviewcodeimage

SELECT*FROMorderWHEREcustomerId="883c2c5b4e5b"

TheequivalentqueryinMongotogetallordersforasinglecustomerIdof883c2c5b4e5b:Clickheretoviewcodeimage

db.order.find({"customerId":"883c2c5b4e5b"})

Similarly,selectingorderIdandorderDateforonecustomerinSQLwouldbe:Clickheretoviewcodeimage

SELECTorderId,orderDateFROMorderWHEREcustomerId="883c2c5b4e5b"

andtheequivalentinMongowouldbe:Clickheretoviewcodeimage

db.order.find({customerId:"883c2c5b4e5b"},{orderId:1,orderDate:1})

Similarly,queriestocount,sum,andsoonareallavailable.Sincethedocumentsareaggregatedobjects,itisreallyeasytoqueryfordocumentsthathavetobematchedusingthefieldswithchild

objects.Let’ssaywewanttoqueryforalltheorderswhereoneoftheitemsorderedhasanamelikeRefactoring.TheSQLforthisrequirementwouldbe:Clickheretoviewcodeimage

SELECT*FROMcustomerOrder,orderItem,productWHEREcustomerOrder.orderId=orderItem.customerOrderIdANDorderItem.productId=product.productIdANDproduct.nameLIKE'%Refactoring%'

andtheequivalentMongoquerywouldbe:Clickheretoviewcodeimage

db.orders.find({"items.product.name":/Refactoring/})

ThequeryforMongoDBissimplerbecausetheobjectsareembeddedinsideasingledocumentandyoucanquerybasedontheembeddedchilddocuments.

9.2.5.ScalingTheideaofscalingistoaddnodesorchangedatastoragewithoutsimplymigratingthedatabasetoabiggerbox.Wearenottalkingaboutmakingapplicationchangestohandlemoreload;instead,weareinterestedinwhatfeaturesareinthedatabasesothatitcanhandlemoreload.Scalingforheavy-readloadscanbeachievedbyaddingmorereadslaves,sothatallthereadscan

bedirectedtotheslaves.Givenaheavy-readapplication,withour3-nodereplica-setcluster,wecanaddmorereadcapacitytotheclusterasthereadloadincreasesjustbyaddingmoreslavenodestothereplicasettoexecutereadswiththeslaveOkflag(Figure9.2).Thisishorizontalscalingforreads.

Figure9.2.Addinganewnode,mongoD,toanexistingreplica-setclusterOncethenewnode,mongoD,isstarted,itneedstobeaddedtothereplicaset.

rs.add("mongod:27017");

Whenanewnodeisadded,itwillsyncupwiththeexistingnodes,jointhereplicasetassecondarynode,andstartservingreadrequests.Anadvantageofthissetupisthatwedonothavetorestartanyothernodes,andthereisnodowntimefortheapplicationeither.Whenwewanttoscaleforwrite,wecanstartsharding(“Sharding,”p.38)thedata.Shardingis

similartopartitionsinRDBMSwherewesplitdatabyvalueinacertaincolumn,suchasstateoryear.

WithRDBMS,partitionsareusuallyonthesamenode,sotheclientapplicationdoesnothavetoqueryaspecificpartitionbutcankeepqueryingthebasetable;theRDBMStakescareoffindingtherightpartitionforthequeryandreturnsthedata.Insharding,thedataisalsosplitbycertainfield,butthenmovedtodifferentMongonodes.The

dataisdynamicallymovedbetweennodestoensurethatshardsarealwaysbalanced.Wecanaddmorenodestotheclusterandincreasethenumberofwritablenodes,enablinghorizontalscalingforwrites.Clickheretoviewcodeimage

db.runCommand({shardcollection:"ecommerce.customer",key:{firstname:1}})

Splittingthedataonthefirstnameofthecustomerensuresthatthedataisbalancedacrosstheshardsforoptimalwriteperformance;furthermore,eachshardcanbeareplicasetensuringbetterreadperformancewithintheshard(Figure9.3).Whenweaddanewshardtothisexistingshardedcluster,thedatawillnowbebalancedacrossfourshardsinsteadofthree.Asallthisdatamovementandinfrastructurerefactoringishappening,theapplicationwillnotexperienceanydowntime,althoughtheclustermaynotperformoptimallywhenlargeamountsofdataarebeingmovedtorebalancetheshards.

Figure9.3.MongoDBshardedsetupwhereeachshardisareplicasetTheshardkeyplaysanimportantrole.YoumaywanttoplaceyourMongoDBdatabaseshards

closertotheirusers,soshardingbasedonuserlocationmaybeagoodidea.Whenshardingbycustomerlocation,alluserdatafortheEastCoastoftheUSAisintheshardsthatareservedfromtheEastCoast,andalluserdatafortheWestCoastisintheshardsthatareontheWestCoast.

9.3.SuitableUseCases9.3.1.EventLoggingApplicationshavedifferenteventloggingneeds;withintheenterprise,therearemanydifferentapplicationsthatwanttologevents.Documentdatabasescanstoreallthesedifferenttypesofeventsandcanactasacentraldatastoreforeventstorage.Thisisespeciallytruewhenthetypeofdatabeingcapturedbytheeventskeepschanging.Eventscanbeshardedbythenameoftheapplicationwheretheeventoriginatedorbythetypeofeventsuchasorder_processedorcustomer_logged.

9.3.2.ContentManagementSystems,BloggingPlatformsSincedocumentdatabaseshavenopredefinedschemasandusuallyunderstandJSONdocuments,theyworkwellincontentmanagementsystemsorapplicationsforpublishingwebsites,managinguser

comments,userregistrations,profiles,web-facingdocuments.

9.3.3.WebAnalyticsorReal-TimeAnalyticsDocumentdatabasescanstoredataforreal-timeanalytics;sincepartsofthedocumentcanbeupdated,it’sveryeasytostorepageviewsoruniquevisitors,andnewmetricscanbeeasilyaddedwithoutschemachanges.

9.3.4.E-CommerceApplicationsE-commerceapplicationsoftenneedtohaveflexibleschemaforproductsandorders,aswellastheabilitytoevolvetheirdatamodelswithoutexpensivedatabaserefactoringordatamigration(“SchemaChangesinaNoSQLDataStore,”p.128).

9.4.WhenNottoUseThereareproblemspaceswheredocumentdatabasesarenotthebestsolution.

9.4.1.ComplexTransactionsSpanningDifferentOperationsIfyouneedtohaveatomiccross-documentoperations,thendocumentdatabasesmaynotbeforyou.However,therearesomedocumentdatabasesthatdosupportthesekindsofoperations,suchasRavenDB.

9.4.2.QueriesagainstVaryingAggregateStructureFlexibleschemameansthatthedatabasedoesnotenforceanyrestrictionsontheschema.Dataissavedintheformofapplicationentities.Ifyouneedtoquerytheseentitiesadhoc,yourquerieswillbechanging(inRDBMSterms,thiswouldmeanthatasyoujoincriteriabetweentables,thetablestojoinkeepchanging).Sincethedataissavedasanaggregate,ifthedesignoftheaggregateisconstantlychanging,youneedtosavetheaggregatesatthelowestlevelofgranularity—basically,youneedtonormalizethedata.Inthisscenario,documentdatabasesmaynotwork.

Chapter10.Column-FamilyStores

Column-familystores,suchasCassandra[Cassandra],HBase[Hbase],Hypertable[Hypertable],andAmazonSimpleDB[AmazonSimpleDB],allowyoutostoredatawithkeysmappedtovaluesandthevaluesgroupedintomultiplecolumnfamilies,eachcolumnfamilybeingamapofdata.

10.1.WhatIsaColumn-FamilyDataStore?Therearemanycolumn-familydatabases.Inthischapter,wewilltalkaboutCassandrabutalsoreferenceothercolumn-familydatabasestodiscussfeaturesthatmaybeofinterestinparticularscenarios.Column-familydatabasesstoredataincolumnfamiliesasrowsthathavemanycolumnsassociated

witharowkey(Figure10.1).Columnfamiliesaregroupsofrelateddatathatisoftenaccessedtogether.ForaCustomer,wewouldoftenaccesstheirProfileinformationatthesametime,butnottheirOrders.

Figure10.1.Cassandra’sdatamodelwithcolumnfamiliesCassandraisoneofthepopularcolumn-familydatabases;thereareothers,suchasHBase,

Hypertable,andAmazonDynamoDB[AmazonDynamoDB].Cassandracanbedescribedasfastandeasilyscalablewithwriteoperationsspreadacrossthecluster.Theclusterdoesnothaveamasternode,soanyreadandwritecanbehandledbyanynodeinthecluster.

10.2.Features

Let’sstartbylookingathowdataisstructuredinCassandra.ThebasicunitofstorageinCassandraisacolumn.ACassandracolumnconsistsofaname-valuepairwherethenamealsobehavesasthekey.Eachofthesekey-valuepairsisasinglecolumnandisalwaysstoredwithatimestampvalue.Thetimestampisusedtoexpiredata,resolvewriteconflicts,dealwithstaledata,anddootherthings.Oncethecolumndataisnolongerused,thespacecanbereclaimedlaterduringacompactionphase.Clickheretoviewcodeimage

{name:"fullName",value:"MartinFowler",timestamp:12345667890}

ThecolumnhasakeyoffirstNameandthevalueofMartinandhasatimestampattachedtoit.Arowisacollectionofcolumnsattachedorlinkedtoakey;acollectionofsimilarrowsmakesacolumnfamily.Whenthecolumnsinacolumnfamilyaresimplecolumns,thecolumnfamilyisknownasstandardcolumnfamily.Clickheretoviewcodeimage

//columnfamily{//row"pramod-sadalage":{firstName:"Pramod",lastName:"Sadalage",lastVisit:"2012/12/12"}//row"martin-fowler":{firstName:"Martin",lastName:"Fowler",location:"Boston"}}

EachcolumnfamilycanbecomparedtoacontainerofrowsinanRDBMStablewherethekeyidentifiestherowandtherowconsistsonmultiplecolumns.Thedifferenceisthatvariousrowsdonothavetohavethesamecolumns,andcolumnscanbeaddedtoanyrowatanytimewithouthavingtoaddittootherrows.Wehavethepramod-sadalagerowandthemartin-fowlerrowwithdifferentcolumns;bothrowsarepartofthecolumnfamily.Whenacolumnconsistsofamapofcolumns,thenwehaveasupercolumn.Asupercolumn

consistsofanameandavaluewhichisamapofcolumns.Thinkofasupercolumnasacontainerofcolumns.Clickheretoviewcodeimage

{name:"book:978-0767905923",value:{author:"MitchAlbon",title:"TuesdayswithMorrie",isbn:"978-0767905923"}}

Whenweusesupercolumnstocreateacolumnfamily,wegetasupercolumnfamily.Clickheretoviewcodeimage

//supercolumnfamily{//rowname:"billing:martin-fowler",value:{address:{name:"address:default",value:{fullName:"MartinFowler",street:"100N.MainStreet",zip:"20145"}},billing:{name:"billing:default",value:{creditcard:"8888-8888-8888-8888",expDate:"12/2016"}}}//rowname:"billing:pramod-sadalage",value:{address:{name:"address:default",value:{fullName:"PramodSadalage",street:"100E.StateParkway",zip:"54130"}},billing:{name:"billing:default",value:{creditcard:"9999-8888-7777-4444",expDate:"01/2016"}}}}

Supercolumnfamiliesaregoodtokeeprelateddatatogether,butwhensomeofthecolumnsarenotneededmostofthetime,thecolumnsarestillfetchedanddeserializedbyCassandra,whichmaynotbeoptimal.Cassandraputsthestandardandsupercolumnfamiliesintokeyspaces.Akeyspaceissimilartoa

databaseinRDBMSwhereallcolumnfamiliesrelatedtotheapplicationarestored.Keyspaceshavetobecreatedsothatcolumnfamiliescanbeassignedtothem:createkeyspaceecommerce

10.2.1.ConsistencyWhenawriteisreceivedbyCassandra,thedataisfirstrecordedinacommitlog,thenwrittentoanin-memorystructureknownasmemtable.Awriteoperationisconsideredsuccessfulonceit’swrittentothecommitlogandthememtable.WritesarebatchedinmemoryandperiodicallywrittenouttostructuresknownasSSTable.SSTablesarenotwrittentoagainaftertheyareflushed;iftherearechangestothedata,anewSSTableiswritten.UnusedSSTablesarereclaimedbycompactation.Let’slookatthereadoperationtoseehowconsistencysettingsaffectit.Ifwehaveaconsistency

settingofONEasthedefaultforallreadoperations,thenwhenareadrequestismade,Cassandrareturnsthedatafromthefirstreplica,evenifthedataisstale.Ifthedataisstale,subsequentreadswillgetthelatest(newest)data;thisprocessisknownasreadrepair.Thelowconsistencylevelisgoodtousewhenyoudonotcareifyougetstaledataand/orifyouhavehighreadperformancerequirements.Similarly,ifyouaredoingwrites,Cassandrawouldwritetoonenode’scommitlogandreturna

responsetotheclient.TheconsistencyofONEisgoodifyouhaveveryhighwriteperformancerequirementsandalsodonotmindifsomewritesarelost,whichmayhappenifthenodegoesdownbeforethewriteisreplicatedtoothernodes.Clickheretoviewcodeimage

quorum=newConfigurableConsistencyLevel();quorum.setDefaultReadConsistencyLevel(HConsistencyLevel.QUORUM);quorum.setDefaultWriteConsistencyLevel(HConsistencyLevel.QUORUM);

UsingtheQUORUMconsistencysettingforbothreadandwriteoperationsensuresthatmajorityofthenodesrespondtothereadandthecolumnwiththenewesttimestampisreturnedbacktotheclient,whilethereplicasthatdonothavethenewestdataarerepairedviathereadrepairoperations.Duringwriteoperations,theQUORUMconsistencysettingmeansthatthewritehastopropagatetothemajorityofthenodesbeforeitisconsideredsuccessfulandtheclientisnotified.UsingALLasconsistencylevelmeansthatallnodeswillhavetorespondtoreadsorwrites,which

willmaketheclusternottoleranttofaults—evenwhenonenodeisdown,thewriteorreadisblockedandreportedasafailure.It’sthereforeuponthesystemdesignerstotunetheconsistencylevelsastheapplicationrequirementschange.Withinthesameapplication,theremaybedifferentrequirementsofconsistency;theycanalsochangebasedoneachoperation,forexampleshowingreviewcommentsforaproducthasdifferentconsistencyrequirementscomparedtoreadingthestatusofthelastorderplacedbythecustomer.Duringkeyspacecreation,wecanconfigurehowmanyreplicasofthedataweneedtostore.This

numberdeterminesthereplicationfactorofthedata.Ifyouhaveareplicationfactorof3,thedatacopiedontothreenodes.WhenwritingandreadingdatawithCassandra,ifyouspecifytheconsistencyvaluesof2,yougetthatR+Wisgreaterthanthereplicationfactor(2+2>3)whichgivesyoubetterconsistencyduringwritesandreads.WecanrunthenoderepaircommandforthekeyspaceandforceCassandratocompareeverykey

it’sresponsibleforwiththerestofthereplicas.Asthisoperationisexpensive,wecanalsojustrepairaspecificcolumnfamilyoralistofcolumnfamilies:repairecommerce

repairecommercecustomerInfo

Whileanodeisdown,thedatathatwassupposedtobestoredbythatnodeishandedofftoothernodes.Asthenodecomesbackonline,thechangesmadetothedataarehandedbacktothenode.Thistechniqueisknownashintedhandoff.Hintedhandoffallowsforfasterrestoreoffailednodes.

10.2.2.TransactionsCassandradoesnothavetransactionsinthetraditionalsense—wherewecouldstartmultiplewritesandthendecideifwewanttocommitthechangesornot.InCassandra,awriteisatomicattherowlevel,whichmeansinsertingorupdatingcolumnsforagivenrowkeywillbetreatedasasinglewriteandwilleithersucceedorfail.Writesarefirstwrittentocommitlogsandmemtables,andareonly

consideredgoodwhenthewritetocommitlogandmemtablewassuccessful.Ifanodegoesdown,thecommitlogisusedtoapplychangestothenode,justliketheredologinOracle.Youcanuseexternaltransactionlibraries,suchasZooKeeper[ZooKeeper],tosynchronizeyour

writesandreads.TherearealsolibrariessuchasCages[Cages]thatallowyoutowrapyourtransactionsoverZooKeeper.

10.2.3.AvailabilityCassandraisbydesignhighlyavailable,sincethereisnomasterintheclusterandeverynodeisapeerinthecluster.Theavailabilityofaclustercanbeincreasedbyreducingtheconsistencyleveloftherequests.Availabilityisgovernedbythe(R+W)>Nformula(“Quorums,”p.57)whereWistheminimumnumberofnodeswherethewritemustbesuccessfullywritten,Ristheminimumnumberofnodesthatmustrespondsuccessfullytoaread,andNisthenumberofnodesparticipatinginthereplicationofdata.YoucantunetheavailabilitybychangingtheRandWvaluesforafixedvalueofN.Ina10-nodeCassandraclusterwithareplicationfactorforthekeyspacesetto3(N=3),ifweset

R=2andW=2,thenwehave(2+2)>3.Inthisscenario,whenonenodegoesdown,availabilityisnotaffectedmuch,asthedatacanberetrievedfromtheothertwonodes.IfW=2andR=1,whentwonodesaredowntheclusterisnotavailableforwritebutwecanstillread.Similarly,ifR=2andW=1,wecanwritebuttheclusterisnotavailableforread.WiththeR+W>Nequation,youaremakingconsciousdecisionsaboutconsistencytradeoffs.Youshouldsetupyourkeyspacesandread/writeoperationsbasedonyourneeds—higher

availabilityforwriteorhigheravailabilityforread.

10.2.4.QueryFeaturesWhendesigningthedatamodelinCassandra,itisadvisedtomakethecolumnsandcolumnfamiliesoptimizedforreadingthedata,asitdoesnothavearichquerylanguage;asdataisinsertedinthecolumnfamilies,dataineachrowissortedbycolumnnames.Ifwehaveacolumnthatisretrievedmuchmoreoftenthanothercolumns,it’sbetterperformance-wisetousethatvaluefortherowkeyinstead.10.2.4.1.BasicQueries

BasicqueriesthatcanberunusingaCassandraclientincludetheGET,SET,andDEL.Beforestartingtoqueryfordata,wehavetoissuethekeyspacecommanduseecommerce;.Thisensuresthatallofourqueriesarerunagainstthekeyspacethatweputourdatainto.Beforestartingtousethecolumnfamilyinthekeyspace,wehavetodefinethecolumnfamily.Clickheretoviewcodeimage

CREATECOLUMNFAMILYCustomerWITHcomparator=UTF8TypeANDkey_validation_class=UTF8TypeANDcolumn_metadata=[{column_name:city,validation_class:UTF8Type}{column_name:name,validation_class:UTF8Type}{column_name:web,validation_class:UTF8Type}];

WehaveacolumnfamilynamedCustomerwithname,city,andwebcolumns,andweareinsertingdatainthecolumnfamilywithaCassandraclient.Clickheretoviewcodeimage

SETCustomer['mfowler']['city']='Boston';

SETCustomer['mfowler']['name']='MartinFowler';SETCustomer['mfowler']['web']='www.martinfowler.com';

UsingtheHector[Hector]Javaclient,wecaninsertthesamedatainthecolumnfamily.Clickheretoviewcodeimage

ColumnFamilyTemplate<String,String>template=cassandra.getColumnFamilyTemplate();ColumnFamilyUpdater<String,String>updater=template.createUpdater(key);for(Stringname:values.keySet()){updater.setString(name,values.get(name));}try{template.update(updater);}catch(HectorExceptione){handleException(e);}

WecanreadthedatabackusingtheGETcommand.Therearemultiplewaystogetthedata;wecangetthewholecolumnfamily.GETCustomer['mfowler'];

Wecanevengetjustthecolumnweareinterestedinfromthecolumnfamily.GETCustomer['mfowler']['web'];

Gettingthespecificcolumnweneedismoreefficient,asonlythedatawecareaboutisreturned—whichsaveslotsofdatamovement,especiallywhenthecolumnfamilyhasalargenumberofcolumns.UpdatingthedataisthesameasusingtheSETcommandforthecolumnthatneedstobesettothenewvalue.UsingDELcommand,wecandeleteeitheracolumnortheentirecolumnfamily.Clickheretoviewcodeimage

DELCustomer['mfowler']['city'];

DELCustomer['mfowler'];

10.2.4.2.AdvancedQueriesandIndexing

Cassandraallowsyoutoindexcolumnsotherthanthekeysforthecolumnfamily.Wecandefineanindexonthecitycolumn.Clickheretoviewcodeimage

UPDATECOLUMNFAMILYCustomerWITHcomparator=UTF8TypeANDcolumn_metadata=[{column_name:city,validation_class:UTF8Type,index_type:KEYS}];

Wecannowquerydirectlyagainsttheindexedcolumn.GETCustomerWHEREcity='Boston';

Theseindexesareimplementedasbit-mappedindexesandperformwellforlow-cardinalitycolumnvalues.10.2.4.3.CassandraQueryLanguage(CQL)

CassandrahasaquerylanguagethatsupportsSQL-likecommands,knownasCassandraQueryLanguage(CQL).WecanusetheCQLcommandstocreateacolumnfamily.

Clickheretoviewcodeimage

CREATECOLUMNFAMILYCustomer(KEYvarcharPRIMARYKEY,namevarchar,cityvarchar,webvarchar);

WeinsertthesamedatausingCQL.Clickheretoviewcodeimage

INSERTINTOCustomer(KEY,name,city,web)VALUES('mfowler','MartinFowler','Boston','www.martinfowler.com');

WecanreaddatausingtheSELECTcommand.Herewereadallthecolumns:SELECT*FROMCustomer

Or,wecouldjustSELECTthecolumnsweneed.SELECTname,webFROMCustomer

IndexingcolumnsarecreatedusingtheCREATEINDEXcommand,andthencanbeusedtoquerythedata.Clickheretoviewcodeimage

SELECTname,webFROMCustomerWHEREcity='Boston'

CQLhasmanymorefeaturesforqueryingdata,butitdoesnothaveallthefeaturesthatSQLhas.CQLdoesnotallowjoinsorsubqueries,anditswhereclausesaretypicallysimple.

10.2.5.ScalingScalinganexistingCassandraclusterisamatterofaddingmorenodes.Asnosinglenodeisamaster,whenweaddnodestotheclusterweareimprovingthecapacityoftheclustertosupportmorewritesandreads.Thistypeofhorizontalscalingallowsyoutohavemaximumuptime,astheclusterkeepsservingrequestsfromtheclientswhilenewnodesarebeingaddedtothecluster.

10.3.SuitableUseCasesLet’sdiscusssomeoftheproblemswherecolumn-familydatabasesareagoodfit.

10.3.1.EventLoggingColumn-familydatabaseswiththeirabilitytostoreanydatastructuresareagreatchoicetostoreeventinformation,suchasapplicationstateorerrorsencounteredbytheapplication.Withintheenterprise,allapplicationscanwritetheireventstoCassandrawiththeirowncolumnsandtherowkeyoftheformappname:timestamp.Sincewecanscalewrites,Cassandrawouldworkideallyforaneventloggingsystem(Figure10.2).

Figure10.2.EventloggingwithCassandra

10.3.2.ContentManagementSystems,BloggingPlatformsUsingcolumnfamilies,youcanstoreblogentrieswithtags,categories,links,andtrackbacksindifferentcolumns.Commentscanbeeitherstoredinthesamerowormovedtoadifferentkeyspace;similarly,blogusersandtheactualblogscanbeputintodifferentcolumnfamilies.

10.3.3.CountersOften,inwebapplicationsyouneedtocountandcategorizevisitorsofapagetocalculateanalytics.YoucanusetheCounterColumnTypeduringcreationofacolumnfamily.Clickheretoviewcodeimage

CREATECOLUMNFAMILYvisit_counterWITHdefault_validation_class=CounterColumnTypeANDkey_validation_class=UTF8TypeANDcomparator=UTF8Type;

Onceacolumnfamilyiscreated,youcanhavearbitrarycolumnsforeachpagevisitedwithinthewebapplicationforeveryuser.Clickheretoviewcodeimage

INCRvisit_counter['mfowler'][home]BY1;INCRvisit_counter['mfowler'][products]BY1;INCRvisit_counter['mfowler'][contactus]BY1;

IncrementingcountersusingCQL:Clickheretoviewcodeimage

UPDATEvisit_counterSEThome=home+1WHEREKEY='mfowler'

10.3.4.ExpiringUsageYoumayprovidedemoaccesstousers,ormaywanttoshowadbannersonawebsiteforaspecifictime.Youcandothisbyusingexpiringcolumns:Cassandraallowsyoutohavecolumnswhich,afteragiventime,aredeletedautomatically.ThistimeisknownasTTL(TimeToLive)andisdefinedinseconds.ThecolumnisdeletedaftertheTTLhaselapsed;whenthecolumndoesnotexist,theaccesscanberevokedorthebannercanberemoved.Clickheretoviewcodeimage

SETCustomer['mfowler']['demo_access']='allowed'WITHttl=2592000;

10.4.WhenNottoUseThereareproblemsforwhichcolumn-familydatabasesarenotthebestsolutions,suchassystemsthatrequireACIDtransactionsforwritesandreads.Ifyouneedthedatabasetoaggregatethedatausingqueries(suchasSUMorAVG),youhavetodothisontheclientsideusingdataretrievedbytheclientfromalltherows.Cassandraisnotgreatforearlyprototypesorinitialtechspikes:Duringtheearlystages,weare

notsurehowthequerypatternsmaychange,andasthequerypatternschange,wehavetochangethecolumnfamilydesign.Thiscausesfrictionfortheproductinnovationteamandslowsdowndeveloperproductivity.RDBMSimposehighcostonschemachange,whichistradedoffforalowcostofquerychange;inCassandra,thecostmaybehigherforquerychangeascomparedtoschemachange.

Chapter11.GraphDatabases

Graphdatabasesallowyoutostoreentitiesandrelationshipsbetweentheseentities.Entitiesarealsoknownasnodes,whichhaveproperties.Thinkofanodeasaninstanceofanobjectintheapplication.Relationsareknownasedgesthatcanhaveproperties.Edgeshavedirectionalsignificance;nodesareorganizedbyrelationshipswhichallowyoutofindinterestingpatternsbetweenthenodes.Theorganizationofthegraphletsthedatatobestoredonceandtheninterpretedindifferentwaysbasedonrelationships.

11.1.WhatIsaGraphDatabase?IntheexamplegraphinFigure11.1,weseeabunchofnodesrelatedtoeachother.Nodesareentitiesthathaveproperties,suchasname.ThenodeofMartinisactuallyanodethathaspropertyofnamesettoMartin.

Figure11.1.AnexamplegraphstructureWealsoseethatedgeshavetypes,suchaslikes,author,andsoon.Thesepropertiesletus

organizethenodes;forexample,thenodesMartinandPramodhaveanedgeconnectingthemwitharelationshiptypeoffriend.Edgescanhavemultipleproperties.WecanassignapropertyofsinceonthefriendrelationshiptypebetweenMartinandPramod.Relationshiptypeshavedirectionalsignificance;thefriendrelationshiptypeisbidirectionalbutlikesisnot.WhenDawnlikesNoSQLDistilled,itdoesnotautomaticallymeanNoSQLDistilledlikesDawn.

Oncewehaveagraphofthesenodesandedgescreated,wecanquerythegraphinmanyways,suchas“getallnodesemployedbyBigCothatlikeNoSQLDistilled.”Aqueryonthegraphisalsoknownastraversingthegraph.Anadvantageofthegraphdatabasesisthatwecanchangethetraversingrequirementswithouthavingtochangethenodesoredges.Ifwewantto“getallnodesthatlikeNoSQLDistilled,”wecandosowithouthavingtochangetheexistingdataorthemodelofthedatabase,becausewecantraversethegraphanywaywelike.Usually,whenwestoreagraph-likestructureinRDBMS,it’sforasingletypeofrelationship

(“whoismymanager”isacommonexample).Addinganotherrelationshiptothemixusuallymeansalotofschemachangesanddatamovement,whichisnotthecasewhenweareusinggraphdatabases.Similarly,inrelationaldatabaseswemodelthegraphbeforehandbasedontheTraversalwewant;iftheTraversalchanges,thedatawillhavetochange.Ingraphdatabases,traversingthejoinsorrelationshipsisveryfast.Therelationshipbetween

nodesisnotcalculatedatquerytimebutisactuallypersistedasarelationship.Traversingpersistedrelationshipsisfasterthancalculatingthemforeveryquery.Nodescanhavedifferenttypesofrelationshipsbetweenthem,allowingyoutobothrepresent

relationshipsbetweenthedomainentitiesandtohavesecondaryrelationshipsforthingslikecategory,path,time-trees,quad-treesforspatialindexing,orlinkedlistsforsortedaccess.Sincethereisnolimittothenumberandkindofrelationshipsanodecanhave,alltheycanberepresentedinthesamegraphdatabase.

11.2.FeaturesTherearemanygraphdatabasesavailable,suchasNeo4J[Neo4J],InfiniteGraph[InfiniteGraph],OrientDB[OrientDB],orFlockDB[FlockDB](whichisaspecialcase:agraphdatabasethatonlysupportssingle-depthrelationshipsoradjacencylists,whereyoucannottraversemorethanoneleveldeepforrelationships).WewilltakeNeo4Jasarepresentativeofthegraphdatabasesolutionstodiscusshowtheyworkandhowtheycanbeusedtosolveapplicationproblems.InNeo4J,creatingagraphisassimpleascreatingtwonodesandthencreatingarelationship.Let’s

createtwonodes,MartinandPramod:Clickheretoviewcodeimage

Nodemartin=graphDb.createNode();martin.setProperty("name","Martin");

Nodepramod=graphDb.createNode();pramod.setProperty("name","Pramod");

WehaveassignedthenamepropertyofthetwonodesthevaluesofMartinandPramod.Oncewehavemorethanonenode,wecancreatearelationship:Clickheretoviewcodeimage

martin.createRelationshipTo(pramod,FRIEND);

pramod.createRelationshipTo(martin,FRIEND);

Wehavetocreaterelationshipbetweenthenodesinbothdirections,forthedirectionoftherelationshipmatters:Forexample,aproductnodecanbelikedbyuserbuttheproductcannotliketheuser.Thisdirectionalityhelpsindesigningarichdomainmodel(Figure11.2).NodesknowaboutINCOMINGandOUTGOINGrelationshipsthataretraversablebothways.

Figure11.2.RelationshipswithpropertiesRelationshipsarefirst-classcitizensingraphdatabases;mostofthevalueofgraphdatabasesis

derivedfromtherelationships.Relationshipsdon’tonlyhaveatype,astartnode,andanendnode,butcanhavepropertiesoftheirown.Usingthesepropertiesontherelationships,wecanaddintelligencetotherelationship—forexample,sincewhendidtheybecomefriends,whatisthedistancebetweenthenodes,orwhataspectsaresharedbetweenthenodes.Thesepropertiesontherelationshipscanbeusedtoquerythegraph.Sincemostofthepowerfromthegraphdatabasescomesfromtherelationshipsandtheir

properties,alotofthoughtanddesignworkisneededtomodeltherelationshipsinthedomainthatwearetryingtoworkwith.Addingnewrelationshiptypesiseasy;changingexistingnodesandtheirrelationshipsissimilartodatamigration(“MigrationsinGraphDatabases,”p.131),becausethesechangeswillhavetobedoneoneachnodeandeachrelationshipintheexistingdata.

11.2.1.ConsistencySincegraphdatabasesareoperatingonconnectednodes,mostgraphdatabasesolutionsusuallydonotsupportdistributingthenodesondifferentservers.Therearesomesolutions,however,thatsupportnodedistributionacrossaclusterofservers,suchasInfiniteGraph.Withinasingleserver,dataisalwaysconsistent,especiallyinNeo4JwhichisfullyACID-compliant.WhenrunningNeo4Jinacluster,awritetothemasteriseventuallysynchronizedtotheslaves,whileslavesarealwaysavailableforread.Writestoslavesareallowedandareimmediatelysynchronizedtothemaster;otherslaveswillnotbesynchronizedimmediately,though—theywillhavetowaitforthedatatopropagatefromthemaster.Graphdatabasesensureconsistencythroughtransactions.Theydonotallowdangling

relationships:Thestartnodeandendnodealwayshavetoexist,andnodescanonlybedeletedifthey

don’thaveanyrelationshipsattachedtothem.

11.2.2.TransactionsNeo4JisACID-compliant.Beforechanginganynodesoraddinganyrelationshipstoexistingnodes,wehavetostartatransaction.Withoutwrappingoperationsintransactions,wewillgetaNotInTransactionException.Readoperationscanbedonewithoutinitiatingatransaction.Clickheretoviewcodeimage

Transactiontransaction=database.beginTx();try{Nodenode=database.createNode();node.setProperty("name","NoSQLDistilled");node.setProperty("published","2012");transaction.success();}finally{transaction.finish();}

Intheabovecode,westartedatransactiononthedatabase,thencreatedanodeandsetpropertiesonit.Wemarkedthetransactionassuccessandfinallycompleteditbyfinish.Atransactionhastobemarkedassuccess,otherwiseNeo4Jassumesthatitwasafailureandrollsitbackwhenfinishisissued.Settingsuccesswithoutissuingfinishalsodoesnotcommitthedatatothedatabase.Thiswayofmanagingtransactionshastoberememberedwhendeveloping,asitdiffersfromthestandardwayofdoingtransactionsinanRDBMS.

11.2.3.AvailabilityNeo4J,asofversion1.8,achieveshighavailabilitybyprovidingforreplicatedslaves.Theseslavescanalsohandlewrites:Whentheyarewrittento,theysynchronizethewritetothecurrentmaster,andthewriteiscommittedfirstatthemasterandthenattheslave.Otherslaveswilleventuallygettheupdate.Othergraphdatabases,suchasInfiniteGraphandFlockDB,providefordistributedstorageofthenodes.Neo4JusestheApacheZooKeeper[ZooKeeper]tokeeptrackofthelasttransactionIDspersisted

oneachslavenodeandthecurrentmasternode.Onceaserverstartsup,itcommunicateswithZooKeeperandfindsoutwhichserveristhemaster.Iftheserveristhefirstonetojointhecluster,itbecomesthemaster;whenamastergoesdown,theclusterelectsamasterfromtheavailablenodes,thusprovidinghighavailability.

11.2.4.QueryFeaturesGraphdatabasesaresupportedbyquerylanguagessuchasGremlin[Gremlin].Gremlinisadomain-specificlanguagefortraversinggraphs;itcantraverseallgraphdatabasesthatimplementtheBlueprints[Blueprints]propertygraph.Neo4JalsohastheCypher[Cypher]querylanguageforqueryingthegraph.Outsidethesequerylanguages,Neo4Jallowsyoutoquerythegraphforpropertiesofthenodes,traversethegraph,ornavigatethenodesrelationshipsusinglanguagebindings.Propertiesofanodecanbeindexedusingtheindexingservice.Similarly,propertiesof

relationshipsoredgescanbeindexed,soanodeoredgecanbefoundbythevalue.Indexesshouldbequeriedtofindthestartingnodetobeginatraversal.Let’slookatsearchingforthenodeusingnodeindexing.IfwehavethegraphshowninFigure11.1,wecanindexthenodesastheyareaddedtothedatabase,

orwecanindexallthenodeslaterbyiteratingoverthem.Wefirstneedtocreateanindexforthe

nodesusingtheIndexManager.Clickheretoviewcodeimage

Index<Node>nodeIndex=graphDb.index().forNodes("nodes");

Weareindexingthenodesforthenameproperty.Neo4JusesLucene[Lucene]asitsindexingservice.Wewillseelaterthatwecanalsousethefull-textsearchcapabilityofLucene.Whennewnodesarecreated,theycanbeaddedtotheindex.Clickheretoviewcodeimage

Transactiontransaction=graphDb.beginTx();try{Index<Node>nodeIndex=graphDb.index().forNodes("nodes");nodeIndex.add(martin,"name",martin.getProperty("name"));nodeIndex.add(pramod,"name",pramod.getProperty("name"));transaction.success();}finally{transaction.finish();}

Addingnodestotheindexisdoneinsidethecontextofatransaction.Oncethenodesareindexed,wecansearchthemusingtheindexedproperty.IfwesearchforthenodewiththenameofBarbara,wewouldquerytheindexforthepropertyofnametohaveavalueofBarbara.Clickheretoviewcodeimage

Nodenode=nodeIndex.get("name","Barbara").getSingle();

WegetthenodewhosenameisMartin;giventhenode,wecangetallitsrelationships.Clickheretoviewcodeimage

Nodemartin=nodeIndex.get("name","Martin").getSingle();allRelationships=martin.getRelationships();

WecangetbothINCOMINGorOUTGOINGrelationships.Clickheretoviewcodeimage

incomingRelations=martin.getRelationships(Direction.INCOMING);

Wecanalsoapplydirectionalfiltersonthequerieswhenqueryingforarelationship.WiththegraphinFigure11.1,ifwewanttofindallpeoplewholikeNoSQLDistilled,wecanfindtheNoSQLDistillednodeandthengetitsrelationshipswithDirection.INCOMING.Atthispointwecanalsoaddthetypeofrelationshiptothequeryfilter,sincewearelookingonlyfornodesthatLIKENoSQLDistilled.Clickheretoviewcodeimage

NodenosqlDistilled=nodeIndex.get("name","NoSQLDistilled").getSingle();relationships=nosqlDistilled.getRelationships(INCOMING,LIKES);for(Relationshiprelationship:relationships){likesNoSQLDistilled.add(relationship.getStartNode());}

Findingnodesandtheirimmediaterelationsiseasy,butthiscanalsobeachievedinRDBMSdatabases.Graphdatabasesarereallypowerfulwhenyouwanttotraversethegraphsatanydepthandspecifyastartingnodeforthetraversal.Thisisespeciallyusefulwhenyouaretryingtofindnodesthatarerelatedtothestartingnodeatmorethanoneleveldown.Asthedepthofthegraphincreases,itmakesmoresensetotraversetherelationshipsbyusingaTraverserwhereyoucanspecifythatyou

arelookingforINCOMING,OUTGOING,orBOTHtypesofrelationships.Youcanalsomakethetraversergotop-downorsidewaysonthegraphbyusingOrdervaluesofBREADTH_FIRSTorDEPTH_FIRST.Thetraversalhastostartatsomenode—inthisexample,wetrytofindallthenodesatanydepththatarerelatedasaFRIENDwithBarbara:Clickheretoviewcodeimage

Nodebarbara=nodeIndex.get("name","Barbara").getSingle();

TraverserfriendsTraverser=barbara.traverse(Order.BREADTH_FIRST,StopEvaluator.END_OF_GRAPH,ReturnableEvaluator.ALL_BUT_START_NODE,EdgeType.FRIEND,Direction.OUTGOING);

ThefriendsTraverserprovidesusawaytofindallthenodesthatarerelatedtoBarbarawheretherelationshiptypeisFRIEND.Thenodescanbeatanydepth—friendofafriendatanylevel—allowingyoutoexploretreestructures.Oneofthegoodfeaturesofgraphdatabasesisfindingpathsbetweentwonodes—determiningif

therearemultiplepaths,findingallofthepathsortheshortestpath.InthegraphinFigure11.1,weknowthatBarbaraisconnectedtoJillbytwodistinctpaths;tofindallthesepathsandthedistancebetweenBarbaraandJillalongthosedifferentpaths,wecanuseClickheretoviewcodeimage

Nodebarbara=nodeIndex.get("name","Barbara").getSingle();Nodejill=nodeIndex.get("name","Jill").getSingle();PathFinder<Path>finder=GraphAlgoFactory.allPaths(Traversal.expanderForTypes(FRIEND,Direction.OUTGOING),MAX_DEPTH);Iterable<Path>paths=finder.findAllPaths(barbara,jill);

Thisfeatureisusedinsocialnetworkstoshowrelationshipsbetweenanytwonodes.Tofindallthepathsandthedistancebetweenthenodesforeachpath,wefirstgetalistofdistinctpathsbetweenthetwonodes.Thelengthofeachpathisthenumberofhopsonthegraphneededtoreachthedestinationnodefromthestartnode.Often,youneedtogettheshortestpathbetweentwonodes;ofthetwopathsfromBarbaratoJill,theshortestpathcanbefoundbyusingClickheretoviewcodeimage

PathFinder<Path>finder=GraphAlgoFactory.shortestPath(Traversal.expanderForTypes(FRIEND,Direction.OUTGOING),MAX_DEPTH);Iterable<Path>paths=finder.findAllPaths(barbara,jill);

Manyothergraphalgorithmscanbeappliedtothegraphathand,suchasDijkstra’salgorithm[Dijkstra’s]forfindingtheshortestorcheapestpathbetweennodes.Clickheretoviewcodeimage

STARTbeginingNode=(beginningnodespecification)MATCH(relationship,patternmatches)WHERE(filteringcondition:ondatainnodesandrelationships)RETURN(Whattoreturn:nodes,relationships,properties)ORDERBY(propertiestoorderby)SKIP(nodestoskipfromtop)LIMIT(limitresults)

Neo4JalsoprovidestheCypherquerylanguagetoquerythegraph.CypherneedsanodetoSTARTthequery.ThestartnodecanbeidentifiedbyitsnodeID,alistofnodeIDs,orindexlookups.Cypher

usestheMATCHkeywordformatchingpatternsinrelationships;theWHEREkeywordfiltersthepropertiesonanodeorrelationship.TheRETURNkeywordspecifieswhatgetsreturnedbythequery—nodes,relationships,orfieldsonthenodesorrelationships.CypheralsoprovidesmethodstoORDER,AGGREGATE,SKIP,andLIMITthedata.InFigure11.2,we

findallnodesconnectedtoBarbara,eitherincomingoroutgoing,byusingthe--.Clickheretoviewcodeimage

STARTbarbara=node:nodeIndex(name="Barbara")MATCH(barbara)--(connected_node)RETURNconnected_node

Wheninterestedindirectionalsignificance,wecanuseMATCH(barbara)<--(connected_node)

forincomingrelationshipsorMATCH(barbara)-->(connected_node)

foroutgoingrelationships.Matchcanalsobedoneonspecificrelationshipsusingthe:RELATIONSHIP_TYPEconventionandreturningtherequiredfieldsornodes.Clickheretoviewcodeimage

STARTbarbara=node:nodeIndex(name="Barbara")MATCH(barbara)-[:FRIEND]->(friend_node)RETURNfriend_node.name,friend_node.location

WestartwithBarbara,findalloutgoingrelationshipswiththetypeofFRIEND,andreturnthefriends’names.Therelationshiptypequeryonlyworksforthedepthofonelevel;wecanmakeitworkforgreaterdepthsandfindoutthedepthofeachoftheresultnodes.Clickheretoviewcodeimage

STARTbarbara=node:nodeIndex(name="Barbara")MATCHpath=barbara-[:FRIEND*1..3]->end_nodeRETURNbarbara.name,end_node.name,length(path)

Similarly,wecanqueryforrelationshipswhereaparticularrelationshippropertyexists.Wecanalsofilteronthepropertiesofrelationshipsandqueryifapropertyexistsornot.Clickheretoviewcodeimage

STARTbarbara=node:nodeIndex(name="Barbara")MATCH(barbara)-[relation]->(related_node)WHEREtype(relation)='FRIEND'ANDrelation.shareRETURNrelated_node.name,relation.since

TherearemanyotherqueryfeaturesintheCypherlanguagethatcanbeusedtoquerydatabasegraphs.

11.2.5.ScalingInNoSQLdatabases,oneofthecommonlyusedscalingtechniquesissharding,wheredataissplitanddistributedacrossdifferentservers.Withgraphdatabases,shardingisdifficult,asgraphdatabasesarenotaggregate-orientedbutrelationship-oriented.Sinceanygivennodecanberelatedtoanyothernode,storingrelatednodesonthesameserverisbetterforgraphtraversal.Traversingagraphwhenthenodesareondifferentmachinesisnotgoodforperformance.Knowingthislimitationofthegraphdatabases,wecanstillscalethemusingsomecommontechniquesdescribedbyJimWebber

[WebberNeo4JScaling].Generallyspeaking,therearethreewaystoscalegraphdatabases.Sincemachinesnowcancome

withlotsofRAM,wecanaddenoughRAMtotheserversothattheworkingsetofnodesandrelationshipsisheldentirelyinmemory.ThistechniqueisonlyhelpfulifthedatasetthatweareworkingwithwillfitinarealisticamountofRAM.Wecanimprovethereadscalingofthedatabasebyaddingmoreslaveswithread-onlyaccesstothe

data,withallthewritesgoingtothemaster.ThispatternofwritingonceandreadingfrommanyserversisaproventechniqueinMySQLclustersandisreallyusefulwhenthedatasetislargeenoughtonotfitinasinglemachine’sRAM,butsmallenoughtobereplicatedacrossmultiplemachines.Slavescanalsocontributetoavailabilityandread-scaling,astheycanbeconfiguredtoneverbecomeamaster,remainingalwaysread-only.Whenthedatasetsizemakesreplicationimpractical,wecanshard(seethe“Sharding”sectiononp.

38)thedatafromtheapplicationsideusingdomain-specificknowledge.Forexample,nodesthatrelatetotheNorthAmericacanbecreatedononeserverwhilethenodesthatrelatetoAsiaonanother.Thisapplication-levelshardingneedstounderstandthatnodesarestoredonphysicallydifferentdatabases(Figure11.3).

Figure11.3.Application-levelshardingofnodes

11.3.SuitableUseCasesLet’slookatsomesuitableusecasesforgraphdatabases.

11.3.1.ConnectedDataSocialnetworksarewheregraphdatabasescanbedeployedandusedveryeffectively.Thesesocialgraphsdon’thavetobeonlyofthefriendkind;forexample,theycanrepresentemployees,theirknowledge,andwheretheyworkedwithotheremployeesondifferentprojects.Anylink-richdomainiswellsuitedforgraphdatabases.Ifyouhaverelationshipsbetweendomainentitiesfromdifferentdomains(suchassocial,spatial,

commerce)inasingledatabase,youcanmaketheserelationshipsmorevaluablebyprovidingthe

abilitytotraverseacrossdomains.

11.3.2.Routing,Dispatch,andLocation-BasedServicesEverylocationoraddressthathasadeliveryisanode,andallthenodeswherethedeliveryhastobemadebythedeliverypersoncanbemodeledasagraphofnodes.Relationshipsbetweennodescanhavethepropertyofdistance,thusallowingyoutodeliverthegoodsinanefficientmanner.Distanceandlocationpropertiescanalsobeusedingraphsofplacesofinterest,sothatyourapplicationcanproviderecommendationsofgoodrestaurantsorentertainmentoptionsnearby.Youcanalsocreatenodesforyourpointsofsales,suchasbookstoresorrestaurants,andnotifytheuserswhentheyareclosetoanyofthenodestoprovidelocation-basedservices.

11.3.3.RecommendationEnginesAsnodesandrelationshipsarecreatedinthesystem,theycanbeusedtomakerecommendationslike“yourfriendsalsoboughtthisproduct”or“wheninvoicingthisitem,theseotheritemsareusuallyinvoiced.”Or,itcanbeusedtomakerecommendationstotravelersmentioningthatwhenothervisitorscometoBarcelonatheyusuallyvisitAntonioGaudi’screations.Aninterestingsideeffectofusingthegraphdatabasesforrecommendationsisthatasthedatasize

grows,thenumberofnodesandrelationshipsavailabletomaketherecommendationsquicklyincreases.Thesamedatacanalsobeusedtomineinformation—forexample,whichproductsarealwaysboughttogether,orwhichitemsarealwaysinvoicedtogether;alertscanberaisedwhentheseconditionsarenotmet.Likeotherrecommendationengines,graphdatabasescanbeusedtosearchforpatternsinrelationshipstodetectfraudintransactions.

11.4.WhenNottoUseInsomesituations,graphdatabasesmaynotappropriate.Whenyouwanttoupdateallorasubsetofentities—forexample,inananalyticssolutionwhereallentitiesmayneedtobeupdatedwithachangedproperty—graphdatabasesmaynotbeoptimalsincechangingapropertyonallthenodesisnotastraightforwardoperation.Evenifthedatamodelworksfortheproblemdomain,somedatabasesmaybeunabletohandlelotsofdata,especiallyinglobalgraphoperations(thoseinvolvingthewholegraph).

Chapter12.SchemaMigrations

12.1.SchemaChangesTherecenttrendindiscussingNoSQLdatabasesistohighlighttheirschemalessnature—itisapopularfeaturethatallowsdeveloperstoconcentrateonthedomaindesignwithoutworryingaboutschemachanges.It’sespeciallytruewiththeriseofagilemethods[AgileMethods]whererespondingtochangingrequirementsisimportant.Discussions,iterations,andfeedbackloopsinvolvingdomainexpertsandproductownersare

importanttoderivetherightunderstandingofthedata;thesediscussionsmustnotbehamperedbyadatabase’sschemacomplexity.WithNoSQLdatastores,changestotheschemacanbemadewiththeleastamountoffriction,improvingdeveloperproductivity(“TheEmergenceofNoSQL,”p.9).Wehaveseenthatdevelopingandmaintaininganapplicationinthebravenewworldofschemalessdatabasesrequirescarefulattentiontobegiventoschemamigration.

12.2.SchemaChangesinRDBMSWhiledevelopingwithstandardRDBMStechnologies,wedevelopobjects,theircorrespondingtables,andtheirrelationships.ConsiderasimpleobjectmodelanddatamodelthathasCustomer,Order,andOrderItems.TheERmodelwouldlooklikeFigure12.1.

Figure12.1.Datamodelofane-commercesystemWhilethisdatamodelsupportsthecurrentobjectmodel,lifeisgood.Thefirsttimethereisa

changeintheobjectmodel,suchasintroducingpreferredShippingTypeontheCustomerobject,wehavetochangetheobjectandchangethedatabasetable,becausewithoutchangingthetabletheapplicationwillbeoutofsyncwiththedatabase.WhenwegeterrorslikeORA-00942:tableorviewdoesnotexistorORA-00904:"PREFERRED_SHIPPING_TYPE":invalididentifier,weknowwehavethisproblem.Typically,adatabaseschemamigrationhasbeenaprojectinitself.Fordeploymentoftheschema

changes,databasechangescriptsaredeveloped,usingdifftechniques,forallthechangesinthedevelopmentdatabase.Thisapproachofcreatingmigrationscriptsduringthedeployment/releasetimeiserror-proneanddoesnotsupportagiledevelopmentmethods.

12.2.1.MigrationsforGreenFieldProjectsScriptingthedatabaseschemachangesduringdevelopmentisbetter,sincewecanstoretheseschemachangesalongwiththedatamigrationscriptsinthesamescriptfile.Thesescriptfilesshouldbenamedwithincrementingsequentialnumberswhichreflectthedatabaseversions;forexample,the

firstchangetothedatabasecouldhavescriptfilenamedas001_Description_Of_Change.sql.Scriptingchangesthiswayallowsforthedatabasemigrationstoberunpreservingtheorderofchanges.ShowninFigure12.2isafolderofallthechangesdonetoadatabasesofar.

Figure12.2.SequenceofmigrationsappliedtoadatabaseNow,supposeweneedtochangetheOrderItemtabletostoretheDiscountedPriceandthe

FullPriceoftheitem.ThiswillneedachangetotheOrderItemtableandwillbechangenumber007inoursequenceofchanges,asshowninFigure12.3.

Figure12.3.Newchange007_DiscountedPrice.sqlappliedtothedatabaseWeappliedanewchangetothedatabase.Thischange’sscripthasthecodeforaddinganew

column,renamingtheexistingcolumn,andmigratingthedataneededtomakethenewfeaturework.Shownbelowisthescriptcontainedinthechange007_DiscountedPrice.sql:Clickheretoviewcodeimage

ALTERTABLEorderitemADDdiscountedpriceNUMBER(18,2)NULL;UPDATEorderitemSETdiscountedprice=price;ALTERTABLEorderitemMODIFYdiscountedpriceNOTNULL;ALTERTABLEorderitemRENAMECOLUMNpriceTOfullprice;--//@UNDOALTERTABLEorderitemRENAMEfullpriceTOprice;ALTERTABLEorderitemDROPCOLUMNdiscountedprice;

Thechangescriptshowstheschemachangestothedatabaseaswellasthedatamigrationsneededtobedone.Intheexampleshown,weareusingDBDeploy[DBDeploy]astheframeworktomanagethechangestothedatabase.DBDeploymaintainsatableinthedatabase,namedChangeLog,whereallthechangesmadetothedatabasearestored.Inthistable,Change_Numberiswhattellseveryonewhichchangeshavebeenappliedtothedatabase.ThisChange_Number,whichisthedatabaseversion,isthenusedtofindthecorrespondingnumberedscriptinthefolderandapplythechangeswhichhavenotbeenappliedyet.Whenwewriteascriptwiththechangenumber007andapplyittothedatabaseusingDBDeploy,DBDeploywillchecktheChangeLogandpickupallthescriptsfromthefolderthathavenotyetbeenapplied.Figure12.4isthescreenshotofDBDeployapplyingthechangetothe

database.

Figure12.4.DBDeployupgradingthedatabasewithchangenumber007Thebestwaytointegratewiththerestofthedevelopersistouseyourproject’sversioncontrol

repositorytostoreallthesechangescripts,sothatyoucankeeptrackoftheversionofthesoftwareandthedatabaseinthesameplace,eliminatingpossiblemismatchesbetweenthedatabaseandtheapplication.Therearemanyothertoolsforsuchupgrades,includingLiquibase[Liquibase],MyBatisMigrator[MyBatisMigrator],DBMaintain[DBMaintain].

12.2.2.MigrationsinLegacyProjectsNoteveryprojectisagreenfield.Howtoimplementmigrationswhenanexistingapplicationisinproduction?Wefoundthattakinganexistingdatabaseandextractingitsstructureintoscripts,alongwithallthedatabasecodeandanyreferencedata,worksasabaselinefortheproject.Thisbaselineshouldnotcontaintransactionaldata.Oncethebaselineisready,furtherchangescanbedoneusingthemigrationstechniquedescribedabove(Figure12.5).

Figure12.5.UseofbaselinescriptswithalegacydatabaseOneofthemainaspectsofmigrationsshouldbemaintainingbackwardcompatibilityofthe

databaseschema.Inmanyenterprisestherearemultipleapplicationsusingthedatabase;whenwechangethedatabaseforoneapplication,thischangeshouldnotbreakotherapplications.Wecanachievebackwardcompatibilitybymaintainingatransitionphaseforthechange,asdescribedindetailinRefactoringDatabases[AmblerandSadalage].Duringatransitionphase,theoldschemaandthenewschemaaremaintainedinparallelandare

availableforalltheapplicationsusingthedatabase.Forthis,wehavetointroducescaffoldingcode,suchastriggers,views,andvirtualcolumnsensuringotherapplicationscanaccessthedatabaseschemaandthedatatheyrequirewithoutanycodechanges.Clickheretoviewcodeimage

ALTERTABLEcustomerADDfullnameVARCHAR2(60);UPDATEcustomerSETfullname=fname;

CREATEORREPLACETRIGGERSyncCustomerFullNameBEFOREINSERTORUPDATEONcustomerREFERENCINGOLDASOLDNEWASNEWFOREACHROWBEGINIF:NEW.fnameISNULLTHEN:NEW.fname:=:NEW.fullname;ENDIF;IF:NEW.fullnameISNULLTHEN:NEW.fullname:=:NEW.fnameENDIF;END;/

--DropTriggerandfname--whenallapplicationsstartusingcustomer.fullname

Intheexample,wearetryingtorenamethecustomer.fnamecolumntocustomer.fullnameaswewanttoavoidanyambiguityoffnamemeaningeitherfullnameorfirstname.Adirectrenameofthefnamecolumnandchangingtheapplicationcodeweareresponsibleformayjustwork,forourapplication—butwillnotfortheotherapplicationsintheenterprisethatareaccessingthesamedatabase.Usingthetransitionphasetechnique,weintroducethenewcolumnfullname,copythedataoverto

fullname,butleavetheoldcolumnfnamearound.WealsointroduceaBEFOREUPDATEtriggertosynchronizedatabetweenthecolumnsbeforetheyarecommittedtothedatabase.Now,whenapplicationsreaddatafromthetable,theywillreadeitherfromfnameorfrom

fullnamebutwillalwaysgettherightdata.Wecandropthetriggerandthefnamecolumnoncealltheapplicationshavemovedontousingthenewfullnamecolumn.It’sveryhardtodoschemamigrationsonlargedatasetsinRDBMS,especiallyifwehavetokeep

thedatabaseavailabletotheapplications,aslargedatamovementsandstructuralchangesusuallycreatelocksonthedatabasetables.

12.3.SchemaChangesinaNoSQLDataStoreAnRDBMSdatabasehastobechangedbeforetheapplicationischanged.Thisiswhattheschema-free,orschemaless,approachtriestoavoid,aimingatflexibilityofschemachangesperentity.

Frequentchangestotheschemaareneededtoreacttofrequentmarketchangesandproductinnovations.WhendevelopingwithNoSQLdatabases,insomecasestheschemadoesnothavetobethought

aboutbeforehand.Westillhavetodesignandthinkaboutotheraspects,suchasthetypesofrelationships(withgraphdatabases),orthenamesofthecolumnfamilies,rows,columns,orderofcolumns(withcolumndatabases),orhowarethekeysassignedandwhatisthestructureofthedatainsidethevalueobject(withkey-valuestores).Evenifwedidn’tthinkabouttheseupfront,orifwewanttochangeourdecisions,itiseasytodoso.TheclaimthatNoSQLdatabasesareentirelyschemalessismisleading;whiletheystorethedata

withoutregardtotheschemathedataadheresto,thatschemahastobedefinedbytheapplication,becausethedatastreamhastobeparsedbytheapplicationwhenreadingthedatafromthedatabase.Also,theapplicationhastocreatethedatathatwouldbesavedinthedatabase.Iftheapplicationcannotparsethedatafromthedatabase,wehaveaschemamismatchevenif,insteadoftheRDBMSdatabasethrowingaerror,thiserrorisnowencounteredbytheapplication.Thus,eveninschemalessdatabases,theschemaofthedatahastobetakenintoconsiderationwhenrefactoringtheapplication.Schemachangesespeciallymatterwhenthereisadeployedapplicationandexistingproduction

data.Forthesakeofsimplicity,assumeweareusingadocumentdatastorelikeMongoDB[MongoDB]andwehavethesamedatamodelasbefore:customer,order,andorderItems.Clickheretoviewcodeimage

{"_id":"4BD8AE97C47016442AF4A580","customerid":99999,"name":"FooSushiInc","since":"12/12/2012","order":{"orderid":"4821-UXWE-122012","orderdate":"12/12/2001","orderItems":[{"product":"FortuneCookies","price":19.99}]}}

ApplicationcodetowritethisdocumentstructuretoMongoDB:Clickheretoviewcodeimage

BasicDBObjectorderItem=newBasicDBObject();orderItem.put("product",productName);orderItem.put("price",price);orderItems.add(orderItem);

Codetoreadthedocumentbackfromthedatabase:Clickheretoviewcodeimage

BasicDBObjectitem=(BasicDBObject)orderItem;StringproductName=item.getString("product");Doubleprice=item.getDouble("price");

ChangingtheobjectstoaddpreferredShippingTypedoesnotrequireanychangeinthedatabase,asthedatabasedoesnotcarethatdifferentdocumentsdonotfollowthesameschema.Thisallowsforfasterdevelopmentandeasydeployments.Allthatneedstobedeployedistheapplication—nochangesonthedatabasesideareneeded.ThecodehastomakesurethatdocumentsthatdonothavethepreferredShippingTypeattributecanstillbeparsed—andthat’sall.Ofcoursewearesimplifyingtheschemachangesituationhere.Let’slookattheschemachangewe

madebefore:introducingdiscountedPriceandrenamingpricetofullPrice.Tomakethischange,werenamethepriceattributetofullPriceandadddiscountedPriceattribute.ThechangeddocumentisClickheretoviewcodeimage

{"_id":"5BD8AE97C47016442AF4A580","customerid":66778,"name":"IndiaHouse","since":"12/12/2012","order":{"orderid":"4821-UXWE-222012","orderdate":"12/12/2001","orderItems":[{"product":"ChairCovers","fullPrice":29.99,"discountedPrice":26.99}]}}

Oncewedeploythischange,newcustomersandtheirorderscanbesavedandreadbackwithoutproblems,butforexistingordersthepriceoftheirproductcannotberead,becausenowthecodeislookingforfullPricebutthedocumenthasonlyprice.

12.3.1.IncrementalMigrationSchemamismatchtripsmanynewconvertstotheNoSQLworld.Whenschemaischangedontheapplication,wehavetomakesuretoconvertalltheexistingdatatothenewschema(dependingondatasize,thismightbeanexpensiveoperation).Anotheroptionwouldbetomakesurethatdata,beforetheschemachanged,canstillbeparsedbythenewcode,andwhenit’ssaved,itissavedbackinthenewschema.Thistechnique,knownasincrementalmigration,willmigratedataovertime;somedatamaynevergetmigrated,becauseitwasneveraccessed.WearereadingbothpriceandfullPricefromthedocument:Clickheretoviewcodeimage

BasicDBObjectitem=(BasicDBObject)orderItem;StringproductName=item.getString("product");DoublefullPrice=item.getDouble("price");if(fullPrice==null){fullPrice=item.getDouble("fullPrice");}DoublediscountedPrice=item.getDouble("discountedPrice");

Whenwritingthedocumentback,theoldattributepriceisnotsaved:Clickheretoviewcodeimage

BasicDBObjectorderItem=newBasicDBObject();orderItem.put("product",productName);orderItem.put("fullPrice",price);orderItem.put("discountedPrice",discountedPrice);orderItems.add(orderItem);

Whenusingincrementalmigration,therecouldbemanyversionsoftheobjectontheapplicationsidethatcantranslatetheoldschematothenewschema;whilesavingtheobjectback,itissavedusingthenewobject.Thisgradualmigrationofthedatahelpstheapplicationevolvefaster.Theincrementalmigrationtechniquewillcomplicatetheobjectdesign,especiallyasnewchanges

arebeingintroducedyetoldchangesarenotbeingtakenout.Thisperiodbetweenthechange

deploymentandthelastobjectinthedatabasemigratingtothenewschemaisknownasthetransitionperiod(Figure12.6).Keepitasshortaspossibleandfocusittotheminimumpossiblescope—thiswillhelpyoukeepyourobjectsclean.

Figure12.6.TransitionperiodofschemachangesTheincrementalmigrationtechniquecanalsobeimplementedwithaschema_versionfieldonthe

data,usedbytheapplicationtochoosethecorrectcodetoparsethedataintotheobjects.Whensaving,thedataismigratedtothelatestversionandtheschema_versionisupdatedtoreflectthat.Havingapropertranslationlayerbetweenyourdomainandthedatabaseisimportantsothat,asthe

schemachanges,managingmultipleversionsoftheschemaisrestrictedtothetranslationlayeranddoesnotleakintothewholeapplication.Mobileappscreatespecialrequirements.Sincewecannotenforcethelatestupgradesofthe

application,theapplicationshouldbeabletohandlealmostallversionsoftheschema.

12.3.2.MigrationsinGraphDatabasesGraphdatabaseshaveedgesthathavetypesandproperties.Ifyouchangethetypeoftheseedgesinthecodebase,younolongercantraversethedatabase,renderingitunusable.Togetaroundthis,youcantraversealltheedgesandchangethetypeofeachedge.Thisoperationcanbeexpensiveandrequiresyoutowritecodetomigratealltheedgesinthedatabase.Ifweneedtomaintainbackwardcompatibilityordonotwanttochangethewholegraphinonego,

wecanjustcreatenewedgesbetweenthenodes;laterwhenwearecomfortableaboutthechange,theoldedgescanbedropped.Wecanusetraversalswithmultipleedgetypestotraversethegraphusingthenewandoldedgetypes.Thistechniquemayhelpagreatdealwithlargedatabases,especiallyifwewanttomaintainhighavailability.Ifwehavetochangepropertiesonallthenodesoredges,wehavetofetchallthenodesandchange

allthepropertiesthatneedtobechanged.AnexamplewouldbeaddingNodeCreatedByandNodeCreatedOntoallexistingnodestotrackthechangesbeingmadetoeachnode.Clickheretoviewcodeimage

for(Nodenode:database.getAllNodes()){node.setProperty("NodeCreatedBy",getSystemUser());node.setProperty("NodeCreatedOn",getSystemTimeStamp());

}

Wemayhavetochangethedatainthenodes.Newdatamaybederivedfromtheexistingnodedata,oritcouldbeimportedfromsomeothersource.Themigrationcanbedonebyfetchingallnodesusinganindexprovidedbythesourceofdataandwritingrelevantdatatoeachnode.

12.3.3.ChangingAggregateStructureSometimesyouneedtochangetheschemadesign,forexamplebysplittinglargeobjectsintosmalleronesthatarestoredindependently.Supposeyouhaveacustomeraggregatethatcontainsallthecustomersorders,andyouwanttoseparatethecustomerandeachoftheirordersintodifferentaggregateunits.Youthenhavetoensurethatthecodecanworkwithbothversionsoftheaggregates.Ifitdoesnot

findtheoldobjects,itwilllookforthenewaggregates.Codethatrunsinthebackgroundcanreadoneaggregateatatime,makethenecessarychange,and

savethedatabackintodifferentaggregates.Theadvantageofoperatingononeaggregateatatimeisthatthisway,you’renotaffectingdataavailabilityfortheapplication.

12.4.FurtherReadingFormoreonmigrationswithrelationaldatabases,see[AmblerandSadalage].Althoughmuchofthiscontentisspecifictorelationalwork,thegeneralprinciplesinmigrationwillalsoapplytootherdatabases.

12.5.KeyPoints•Databaseswithstrongschemas,suchasrelationaldatabases,canbemigratedbysavingeachschemachange,plusitsdatamigration,inaversion-controlledsequence.

•Schemalessdatabasesstillneedcarefulmigrationduetotheimplicitschemainanycodethataccessesthedata.

•Schemalessdatabasescanusethesamemigrationtechniquesasdatabaseswithstrongschemas.•Schemalessdatabasescanalsoreaddatainawaythat’stoleranttochangesinthedata’simplicitschemaanduseincrementalmigrationtoupdatedata.

Chapter13.PolyglotPersistence

Differentdatabasesaredesignedtosolvedifferentproblems.Usingasingledatabaseengineforalloftherequirementsusuallyleadstonon-performantsolutions;storingtransactionaldata,cachingsessioninformation,traversinggraphofcustomersandtheproductstheirfriendsboughtareessentiallydifferentproblems.EvenintheRDBMSspace,therequirementsofanOLAPandOLTPsystemareverydifferent—nonetheless,theyareoftenforcedintothesameschema.Let’sthinkofdatarelationships.RDBMSsolutionsaregoodatenforcingthatrelationshipsexist.If

wewanttodiscoverrelationships,orhavetofinddatafromdifferenttablesthatbelongtothesameobject,thentheuseofRDBMSstartsbeingdifficult.Databaseenginesaredesignedtoperformcertainoperationsoncertaindatastructuresanddata

amountsverywell—suchasoperatingonsetsofdataorastoreandretrievingkeysandtheirvaluesreallyfast,orstoringrichdocumentsorcomplexgraphsofinformation.

13.1.DisparateDataStorageNeedsManyenterprisestendtousethesamedatabaseenginetostorebusinesstransactions,sessionmanagementdata,andforotherstorageneedssuchasreporting,BI,datawarehousing,orlogginginformation(Figure13.1).

Figure13.1.UseofRDBMSforeveryaspectofstoragefortheapplicationThesession,shoppingcart,ororderdatadonotneedthesamepropertiesofavailability,

consistency,orbackuprequirements.Doessessionmanagementstorageneedthesamerigorousbackup/recoverystrategyasthee-commerceordersdata?Doesthesessionmanagementstorageneedmoreavailabilityofaninstanceofdatabaseenginetowrite/readsessiondata?In2006,NealFordcoinedthetermpolyglotprogramming,toexpresstheideathatapplications

shouldbewritteninamixoflanguagestotakeadvantageofthefactthatdifferentlanguagesaresuitablefortacklingdifferentproblems.Complexapplicationscombinedifferenttypesofproblems,sopickingtherightlanguageforeachjobmaybemoreproductivethantryingtofitallaspectsintoasinglelanguage.

Similarly,whenworkingonane-commercebusinessproblem,usingadatastorefortheshoppingcartwhichishighlyavailableandcanscaleisimportant,butthesamedatastorecannothelpyoufindproductsboughtbythecustomers’friends—whichisatotallydifferentquestion.Weusethetermpolyglotpersistencetodefinethishybridapproachtopersistence.

13.2.PolyglotDataStoreUsageLet’stakeoure-commerceexampleandusethepolyglotpersistenceapproachtoseehowsomeofthesedatastorescanbeapplied(Figure13.2).Akey-valuedatastorecouldbeusedtostoretheshoppingcartdatabeforetheorderisconfirmedbythecustomerandalsostorethesessiondatasothattheRDBMSisnotusedforthistransientdata.Key-valuestoresmakesenseheresincetheshoppingcartisusuallyaccessedbyuserIDand,onceconfirmedandpaidbythecustomer,canbesavedintheRDBMS.Similarly,sessiondataiskeyedbythesessionID.

Figure13.2.Useofkey-valuestorestooffloadsessionandshoppingcartdatastorageIfweneedtorecommendproductstocustomerswhentheyplaceproductsintotheirshoppingcarts

—forexample,“yourfriendsalsoboughttheseproducts”or“yourfriendsboughttheseaccessoriesforthisproduct”—thenintroducingagraphdatastoreinthemixbecomesrelevant(Figure13.3).

Figure13.3.Exampleimplementationofpolyglotpersistence

Itisnotnecessaryfortheapplicationtouseasingledatastoreforallofitsneeds,sincedifferentdatabasesarebuiltfordifferentpurposesandnotallproblemscanbeelegantlysolvedbyasingedatabase.Evenusingspecializedrelationaldatabasesfordifferentpurposes,suchasdatawarehousing

appliancesoranalyticsapplianceswithinthesameapplication,canbeviewedaspolyglotpersistence.

13.3.ServiceUsageoverDirectDataStoreUsageAswemovetowardsmultipledatastoresintheapplication,theremaybeotherapplicationsintheenterprisethatcouldbenefitfromtheuseofourdatastoresorthedatastoredinthem.Usingourexample,thegraphdatastorecanservedatatootherapplicationsthatneedtounderstand,forexample,whichproductsarebeingboughtbyacertainsegmentofthecustomerbase.Insteadofeachapplicationtalkingindependentlytothegraphdatabase,wecanwrapthegraph

databaseintoaservicesothatallrelationshipsbetweenthenodescanbesavedinoneplaceandqueriedbyalltheapplications(Figure13.4).ThedataownershipandtheAPIsprovidedbytheservicearemoreusefulthanasingleapplicationtalkingtomultipledatabases.

Figure13.4.ExampleimplementationofwrappingdatastoresintoservicesThephilosophyofservicewrappingcanbetakenfurther:Youcouldwrapalldatabasesinto

services,lettingtheapplicationtoonlytalktoabunchofservices(Figure13.5).Thisallowsforthedatabasesinsidetheservicestoevolvewithoutyouhavingtochangethedependentapplications.

Figure13.5.UsingservicesinsteadoftalkingtodatabasesManyNoSQLdatastoreproducts,suchasRiak[Riak]andNeo4J[Neo4J],actuallyprovideout-of-

the-boxRESTAPI’s.

13.4.ExpandingforBetterFunctionalityOften,wecannotreallychangethedatastorageforaspecificusagetosomethingdifferent,becauseoftheexistinglegacyapplicationsandtheirdependencyonexistingdatastorage.Wecan,however,addfunctionalitysuchascachingforbetterperformance,oruseindexingenginessuchasSolr[Solr]sothatsearchcanbemoreefficient(Figure13.6).Whentechnologieslikethisareintroduced,wehavetomakesuredataissynchronizedbetweenthedatastoragefortheapplicationandthecacheorindexingengine.

Figure13.6.UsingsupplementalstoragetoenhancelegacystorageWhiledoingthis,weneedtoupdatetheindexeddataasthedataintheapplicationdatabasechanges.

Theprocessofupdatingthedatacanbereal-timeorbatch,aslongasweensurethattheapplication

candealwithstaledataintheindex/searchengine.Theeventsourcing(“EventSourcing,”p.142)patterncanbeusedtoupdatetheindex.

13.5.ChoosingtheRightTechnologyThereisarichchoiceofdatastoragesolutions.Initially,thependulumhadshiftedfromspecialitydatabasestoasingleRDBMSdatabasewhichallowsalltypesofdatamodelstobestored,althoughwithsomeabstraction.Thetrendisnowshiftingbacktousingthedatastoragethatsupportstheimplementationofsolutionsnatively.Ifwewanttorecommendproductstocustomersbasedonwhat’sintheirshoppingcartsandwhich

otherproductswereboughtbycustomerswhoboughtthoseproducts,itcanbeimplementedinanyofthedatastoresbypersistingthedatawiththecorrectattributestoanswerourquestions.Thetrickistousetherighttechnology,sothatwhenthequestionschange,theycanstillbeaskedwiththesamedatastorewithoutlosingexistingdataorchangingitintonewformats.Let’sgobacktoournewfeatureneed.WecanuseRDBMStosolvethisusingahierarchalquery

andmodelingthetablesaccordingly.Whenweneedtochangethetraversal,wewillhavetorefactorthedatabase,migratethedata,andstartpersistingnewdata.Instead,ifwehadusedadatastorethattracksrelationsbetweennodes,wecouldhavejustprogrammedthenewrelationsandkeepusingthesamedatastorewithminimalchanges.

13.6.EnterpriseConcernswithPolyglotPersistenceIntroductionofNoSQLdatastoragetechnologieswillforcetheenterpriseDBAstothinkabouthowtousethenewstorage.TheenterpriseisusedtohavinguniformRDBMSenvironments;whateveristhedatabaseanenterprisestartsusingfirst,chancesarethatovertheyearsallitsapplicationswillbebuiltaroundthesamedatabase.Inthisnewworldofpolyglotpersistence,theDBAgroupswillhavetobecomemorepoly-skilled—tolearnhowsomeoftheseNoSQLtechnologieswork,howtomonitorthesesystems,backthemup,andtakedataoutofandputintothesesystems.OncetheenterprisedecidestouseanyNoSQLtechnology,issuessuchaslicensing,support,tools,

upgrades,drivers,auditing,andsecuritycomeup.ManyNoSQLtechnologiesareopen-sourceandhaveanactivecommunityofsupporters;also,therearecompaniesthatprovidecommercialsupport.Thereisnotarichecosystemoftools,butthetoolvendorsandtheopen-sourcecommunityarecatchingup,releasingtoolssuchasMongoDBMonitoringService[Monitoring],DatastaxOpsCenter[OpsCenter],orRekonbrowserforRiak[Rekon].Oneotherareathatenterprisesareconcernedaboutissecurityofthedata—theabilitytocreate

usersandassignprivilegestoseeornotseedataatthedatabaselevel.MostoftheNoSQLdatabasesdonothaveveryrobustsecurityfeatures,butthat’sbecausetheyaredesignedtooperatedifferently.IntraditionalRDBMS,datawasservedbythedatabaseandwecouldgettothedatabaseusinganyquerytools.WiththeNoSQLdatabases,therearequerytoolsaswellbuttheideaisfortheapplicationtoownthedataandserveitusingservices.Withthisapproach,theresponsibilityforthesecuritylieswiththeapplication.Havingsaidthat,thereareNoSQLtechnologiesthatintroducesecurityfeatures.Enterprisesoftenhavedatawarehousesystems,BI,andanalyticssystemsthatmayneeddatafrom

thepolyglotdatasources.EnterpriseswillhavetoensurethattheETLtoolsoranyothermechanismtheyareusingtomovedatafromsourcesystemstothedatawarehousecanreaddatafromtheNoSQLdatastore.TheETLtoolvendorsarecomingoutwithhavetheabilitytotalktoNoSQLdatabases;forexample,Pentaho[Pentaho]cantalktoMongoDBandCassandra.Everyenterpriserunsanalyticsofsomesort.Asthesheervolumeofdatathatneedstobecaptured

increases,enterprisesarestrugglingtoscaletheirRDBMSsystemstowriteallthisdatatothedatabases.AhugenumberofwritesandtheneedtoscaleforwritesareagreatusecaseforNoSQLdatabasesthatallowyoutowritelargevolumesofdata.

13.7.DeploymentComplexityOncewestartdownthepathofusingpolyglotpersistenceintheapplication,deploymentcomplexityneedscarefulconsideration.Theapplicationnowneedsalldatabasesinproductionatthesametime.YouwillneedtohavethesedatabasesinyourUAT,QA,andDevenvironments.AsmostoftheNoSQLproductsareopen-source,therearefewlicensecostramifications.Theyalsosupportautomationofinstallationandconfiguration.Forexample,toinstalladatabase,allthatneedstobedoneisdownloadandunzipthearchive,whichcanbeautomatedusingcurlandunzipcommands.Theseproductsalsohavesensibledefaultsandcanbestartedwithminimumconfiguration.

13.8.KeyPoints•Polyglotpersistenceisaboutusingdifferentdatastoragetechnologiestohandlevaryingdatastorageneeds.

•Polyglotpersistencecanapplyacrossanenterpriseorwithinasingleapplication.•Encapsulatingdataaccessintoservicesreducestheimpactofdatastoragechoicesonotherpartsofasystem.

•Addingmoredatastoragetechnologiesincreasescomplexityinprogrammingandoperations,sotheadvantagesofagooddatastoragefitneedtobeweighedagainstthiscomplexity.

Chapter14.BeyondNoSQL

TheappearanceofNoSQLdatabaseshasdoneagreatdealtoshakeupandopenuptheworldofdatabases,butwethinkthekindofNoSQLdatabaseswehavediscussedhereisonlypartofthepictureofpolyglotpersistence.Soitmakessensetospendsometimediscussingsolutionsthatdon’teasilyfitintotheNoSQLbucket.

14.1.FileSystemsDatabasesareverycommon,butfilesystemsarealmostubiquitous.Inthelastcoupleofdecadesthey’vebeenwidelyusedforpersonalproductivitydocuments,butnotforenterpriseapplications.Theydon’tadvertiseanyinternalstructure,sotheyaremorelikekey-valuestoreswithahierarchickey.Theyalsoprovidelittlecontroloverconcurrencyotherthansimplefilelocking—whichitselfissimilartothewayNoSQLonlyprovideslockingwithinasingleaggregate.Filesystemshavetheadvantageofbeingsimpleandwidelyimplemented.Theycopewellwithvery

largeentities,suchasvideoandaudio.Often,databasesareusedtoindexmediaassetsstoredinfiles.Filesalsoworkverywellforsequentialaccess,suchasstreaming,whichcanbehandyfordatawhichisappend-only.Recentattentiontoclusteredenvironmentshasseenariseofdistributedfilesystems.Technologies

liketheGoogleFileSystemandHadoop[Hadoop]providesupportforreplicationoffiles.Muchofthediscussionofmap-reduceisaboutmanipulatinglargefilesonclustersystems,withtoolsforautomaticsplittingoflargefilesintosegmentstobeprocessedonmultiplenodes.IndeedacommonentrypathintoNoSQLisfromorganizationsthathavebeenusingHadoop.Filesystemsworkbestforarelativelysmallnumberoflargefilesthatcanbeprocessedinbig

chunks,preferablyinastreamingstyle.Largenumbersofsmallfilesgenerallyperformbadly—thisiswhereadatastorebecomesmoreefficient.FilesalsoprovidenosupportforquerieswithoutadditionalindexingtoolssuchasSolr[Solr].

14.2.EventSourcingEventsourcingisanapproachtopersistencethatconcentratesonpersistingallthechangestoapersistentstate,ratherthanpersistingthecurrentapplicationstateitself.It’sanarchitecturalpatternthatworksquitewellwithmostpersistencetechnologies,includingrelationaldatabases.Wementionitherebecauseitalsounderpinssomeofthemoreunusualwaysofthinkingaboutpersistence.Consideranexampleofasystemthatkeepsalogofthelocationofships(Figure14.1).Ithasa

simpleshiprecordthatkeepsthenameoftheshipanditscurrentlocation.Intheusualwayofthinking,whenwehearthattheshipKingRoyhasarrivedinSanFrancisco,wechangethevalueofKingRoy’slocationfieldtoSanFrancisco.Lateron,wehearit’sdeparted,sowechangeittoatsea,changingitagainonceweknowit’sarrivedinHongKong.

Figure14.1.Inatypicalsystem,noticeofachangecausesanupdatetotheapplication’sstate.Withanevent-sourcedsystem,thefirststepistoconstructaneventobjectthatcapturesthe

informationaboutthechange(Figure14.2).Thiseventobjectisstoredinadurableeventlog.Finally,weprocesstheeventinordertoupdatetheapplication’sstate.

Figure14.2.Witheventsourcing,thesystemstoreseachevent,togetherwiththederivedapplicationstate.

Asaconsequence,inanevent-sourcedsystemwestoreeveryeventthat’scausedastatechangeofthesystemintheeventlog,andtheapplication’sstateisentirelyderivablefromthiseventlog.Atanytime,wecansafelythrowawaytheapplicationstateandrebuilditfromtheeventlog.Intheory,eventlogsareallyouneedbecauseyoucanalwaysrecreatetheapplicationstate

wheneveryouneeditbyreplayingtheeventlog.Inpractice,thismaybetooslow.Asaresult,it’s

usuallybesttoprovidetheabilitytostoreandrecreatetheapplicationstateinasnapshot.Asnapshotisdesignedtopersistthememoryimageoptimizedforrapidrecoveryofthestate.Itisanoptimizationaid,soitshouldnevertakeprecedenceovertheeventlogforauthorityonthedata.Howfrequentlyyoutakeasnapshotdependsonyouruptimeneeds.Thesnapshotdoesn’tneedtobe

completelyuptodate,asyoucanrebuildmemorybyloadingthelatestsnapshotandthenreplayingalleventsprocessedsincethatsnapshotwastaken.Anexampleapproachwouldbetotakeasnapshoteverynight;shouldthesystemgodownduringtheday,you’dreloadlastnight’ssnapshotfollowedbytoday’sevents.Ifyoucandothatquicklyenough,allwillbefine.Togetafullrecordofeverychangeinyourapplicationstate,youneedtokeeptheeventloggoing

backtothebeginningoftimeforyourapplication.Butinmanycasessuchalong-livedrecordisn’tnecessary,asyoucanfoldoldereventsintoasnapshotandonlyusetheeventlogafterthedateofthesnapshot.Usingeventsourcinghasanumberofadvantages.Youcanbroadcasteventstomultiplesystems,

eachofwhichcanbuildadifferentapplicationstatefordifferentpurposes(Figure14.3).Forread-intensivesystems,youcanprovidemultiplereadnodes,withpotentiallydifferentschemas,whileconcentratingthewritesonadifferentprocessingsystem(anapproachbroadlyknownasCQRS[CQRS]).

Figure14.3.Eventscanbebroadcasttomultipledisplaysystems.Eventsourcingisalsoaneffectiveplatformforanalyzinghistoricinformation,sinceyoucan

replicateanypaststateintheeventlog.Youcanalsoeasilyinvestigatealternativescenariosbyintroducinghypotheticaleventsintoananalysisprocessor.Eventsourcingdoesaddsomecomplexity—mostnotably,youhavetoensurethatallstatechanges

arecapturedandstoredasevents.Somearchitecturesandtoolscanmakethatinconvenient.Any

collaborationwithexternalsystemsneedstotaketheeventsourcingintoaccount;you’llneedtobecarefulofexternalsideeffectswhenreplayingeventstorebuildanapplicationstate.

14.3.MemoryImageOnetheconsequencesofeventsourcingisthattheeventlogbecomesthedefinitivepersistentrecord—butitisnotnecessaryfortheapplicationstatetobepersistent.Thisopensuptheoptionofkeepingtheapplicationstateinmemoryusingonlyin-memorydatastructures.Keepingallyourworkingdatainmemoryprovidesaperformanceadvantage,sincethere’snodiskI/Otodealwithwhenaneventisprocessed.Italsosimplifiesprogrammingsincethereisnoneedtoperformmappingbetweendiskandin-memorydatastructures.Theobviouslimitationhereisthatyoumustbeabletostoreallthedatayou’llneedtoaccessin

memory.Thisisanincreasinglyviableoption—wecanrememberdisksizesthatwereconsiderablylessthanthecurrentmemorysizes.Youalsoneedtoensurethatyoucanrecoverquicklyenoughfromasystemcrash—eitherbyreloadingeventsfromtheeventlogorbyrunningaduplicatesystemandcuttingover.You’llneedsomeexplicitmechanismtodealwithconcurrency.Onerouteisatransactional

memorysystem,suchastheonethatcomeswiththeClojurelanguage.Anotherrouteistodoallinputprocessingonasinglethread.Designedcarefully,asingle-threadedeventprocessorcanachieveimpressivethroughputatlowlatency[Fowlerlmax].Breakingtheseparationbetweenin-memoryandpersistentdataalsoaffectshowyouhandleerrors.

Acommonapproachistoupdateamodelandrollbackanychangesshouldanerroroccur.Withamemoryimage,you’llusuallynothaveanautomatedrollbackfacility;youeitherhavetowriteyourown(complicated)orensurethatyoudothoroughvalidationbeforeyoubegintoapplyanychanges.

14.4.VersionControlFormostsoftwaredevelopers,theirmostcommonexperienceofanevent-sourcedsystemisaversioncontrolsystem.Versioncontrolallowsmanypeopleonateamtocoordinatetheirmodificationsofacomplexinterconnectedsystem,withtheabilitytoexplorepaststatesofthatsystemandalternativerealitiesthroughbranching.Whenwethinkofdatastorage,wetendtothinkofasingle-point-of-timeworldview,whichisvery

limitingcomparedtothecomplexitysupportedbyaversioncontrolsystem.It’sthereforesurprisingthatdatastoragetoolshaven’tborrowedsomeoftheideasfromversioncontrolsystems.Afterall,manysituationsrequirehistoricqueriesandsupportformultipleviewsoftheworld.Versioncontrolsystemsarebuiltontopoffilesystems,andthushavemanyofthesamelimitations

fordatastorageasafilesystem.Theyarenotdesignedforapplicationdatastorage,soareawkwardtouseinthatcontext.However,theyareworthconsideringforscenarioswheretheirtimelinecapabilitiesareuseful.

14.5.XMLDatabasesAroundtheturnofthemillennium,peopleseemedtowanttouseXMLforeverything,andtherewasaflurryofinterestindatabasesspecificallydesignedtostoreandqueryXMLdocuments.Whilethatflurryhadaslittleimpactontherelationaldominanceaspreviousblusters,XMLdatabasesarestillaround.WethinkofXMLdatabasesasdocumentdatabaseswherethedocumentsarestoredinadatamodel

compatiblewithXML,andwherevariousXMLtechnologiesareusedtomanipulatethedocument.

YoucanusevariousformsofXMLschemadefinitions(DTDs,XMLSchema,RelaxNG)tocheckdocumentformats,runquerieswithXPathandXQuery,andperformtransformationswithXSLT.RelationaldatabasestookonXMLandblendedtheseXMLcapabilitieswithrelationalones,usually

byembeddingXMLdocumentsasacolumntypeandallowingsomewaytoblendSQLandXMLquerylanguages.Ofcoursethere’snoreasonwhyyoucan’tuseXMLasastructuringmechanismwithinakey-value

store.XMLislessfashionablethesedaysthanJSON,butisequallycapableofstoringcomplexaggregates,andXML’sschemaandquerycapabilitiesaregreaterthanwhatyoucantypicallygetforJSON.UsinganXMLdatabasemeansthatthedatabaseitselfisabletotakeadvantageoftheXMLstructureandnotjusttreatthevalueasablob,butthatadvantageneedstobeweighedwiththeotherdatabasecharacteristics.

14.6.ObjectDatabasesWhenobject-orientedprogrammingstarteditsriseinpopularity,therewasaflurryofinterestinobject-orienteddatabases.Thefocusherewasthecomplexityofmappingfromin-memorydatastructurestorelationaltables.Theideaofanobject-orienteddatabaseisthatyouavoidthiscomplexity—thedatabasewouldautomaticallymanagethestorageofin-memorystructuresontodisk.Youcouldthinkofitasapersistentvirtualmemorysystem,allowingyoutoprogramwithpersistenceyetwithouttakinganynoticeofadatabaseatall.Objectdatabasesdidn’ttakeoff.Onereasonwasthatthebenefitofthecloseintegrationwiththe

applicationmeantyoucouldn’teasilyaccessdataotherthanwiththatapplication.Ashiftfromintegrationdatabasestoapplicationdatabasescouldwellmakeobjectdatabasesmoreviableinthefuture.Animportantissuewithobjectdatabasesishowtodealwithmigrationasthedatastructures

change.Here,thecloselinkagebetweenthepersistentstorageandin-memorystructurescanbecomeaproblem.Someobjectdatabasesincludetheabilitytoaddmigrationfunctionstoobjectdefinitions.

14.7.KeyPoints•NoSQLisjustonesetofdatastoragetechnologies.Astheyincreasecomfortwithpolyglotpersistence,weshouldconsiderotherdatastoragetechnologieswhetherornottheybeartheNoSQLlabel.

Chapter15.ChoosingYourDatabase

Atthispointinthebook,we’vecoveredalotofthegeneralissuesyouneedtobeawareoftomakedecisionsinthenewworldofpolyglotpersistence.It’snowtimetotalkaboutchoosingyourdatabasesforfuturedevelopmentwork.Naturally,wedon’tknowyourparticularcircumstances,sowecan’tgiveyouyouranswer,norcanwereduceittoasimplesetofrulestofollow.Furthermore,it’sstillearlydaysintheproductionuseofNoSQLsystems,soevenwhatwedoknowisimmature—inacoupleofyearswemaywellthinkdifferently.WeseetwobroadreasonstoconsideraNoSQLdatabase:programmerproductivityanddataaccess

performance.Indifferentcasestheseforcesmaycomplementorcontradicteachother.Bothofthemaredifficulttoassessearlyoninaproject,whichisawkwardsinceyourchoiceofadatastoragemodelisdifficulttoabstractsoastoallowyoutochangeyourmindlateron.

15.1.ProgrammerProductivityTalktoanydeveloperofanenterpriseapplication,andyou’llsensefrustrationfromworkingwithrelationaldatabases.Informationisusuallycollectedanddisplayedintermsofaggregates,butithastobetransformedintorelationsinordertopersistit.Thischoreiseasierthanitusedtobe;duringthe1990smanyprojectsgroanedundertheeffortofbuildingobject-relationalmappinglayers.Bythe2000s,we’veseenpopularORMframeworkssuchasHibernate,iBATIS,andRailsActiveRecordthatreducemuchofthatburden.Butthishasnotmadetheproblemgoaway.ORMsarealeakyabstraction,therearealwayssomecasesthatneedmoreattention—particularlyinordertogetdecentperformance.Inthissituationaggregate-orienteddatabasescanofferatemptingdeal.WecanremovetheORM

andpersistaggregatesnaturallyasweusethem.We’vecomeacrossseveralprojectsthatclaimpalpablebenefitsfrommovingtoanaggregate-orientedsolution.Graphdatabasesofferadifferentsimplification.Relationaldatabasesdonotdoagoodjobwith

datathathasalotofrelationships.AgraphdatabaseoffersbothamorenaturalstorageAPIforthiskindofdataandquerycapabilitiesdesignedaroundthesekindsofstructures.AllkindsofNoSQLsystemsarebettersuitedtononuniformdata.Ifyoufindyourselfstruggling

withastrongschemainordertosupportad-hocfields,thentheschemalessNoSQLdatabasescanofferconsiderablerelief.ThesearethemajorreasonswhytheprogrammingmodelofNoSQLdatabasesmayimprovethe

productivityofyourdevelopmentteam.Thefirststepofassessingthisforyourcircumstancesistolookatwhatyoursoftwarewillneedtodo.Runthroughthecurrentfeaturesandseeifandhowthedatausagefits.Asyoudothis,youmaybegintoseethataparticulardatamodelseemslikeagoodfit.Thatclosenessoffitsuggeststhatusingthatmodelwillleadtoeasierprogramming.Asyoudothis,rememberthatpolyglotpersistenceisaboutusingmultipledatastoragesolutions.It

maybethatyou’llseedifferentdatastoragemodelsfitdifferentpartsofyourdata.Thiswouldsuggestusingdifferentdatabasesfordifferentaspectsofyourdata.Usingmultipledatabasesisinherentlymorecomplexthanusingasinglestore,buttheadvantagesofagoodfitineachcasemaybebetteroverall.Asyoulookatthedatamodelfit,payparticularattentiontocaseswherethereisaproblem.You

mayseemostofyourfeatureswillworkwellwithanaggregate,butafewwillnot.Havingafewfeaturesthatdon’tfitthemodelwellisn’tareasontoavoidthemodel—thedifficultiesofthebadfit

maynotoverwhelmtheadvantagesofthegoodfit—butit’susefultospotandhighlightthesebadfitcases.Goingthroughyourfeaturesandassessingyourdataneedsshouldleadyoutooneormore

alternativesforhowtohandleyourdatabaseneeds.Thiswillgiveyouastartingpoint,butthenextstepistotrythingsoutbyactuallybuildingsoftware.Takesomeinitialfeaturesandbuildthem,whilepayingcloseattentiontohowstraightforwarditistousethetechnologyyou’reconsidering.Inthissituation,itmaybeworthwhiletobuildthesamefeatureswithacoupleofdifferentdatabasesinordertoseewhichworksbest.Peopleareoftenreluctanttodothis—noonelikestobuildsoftwarethatwillbediscarded.Yetthisisanessentialwaytojudgehoweffectiveaparticularframeworkis.Sadly,thereisnowaytoproperlymeasurehowproductivedifferentdesignsare.Wehavenoway

ofproperlymeasuringoutput.Evenifyoubuildexactlythesamefeature,youcan’ttrulycomparetheproductivitybecauseknowledgeofbuildingitoncemakesiteasierasecondtime,andyoucan’tbuildthemsimultaneouslywithidenticalteams.Whatyoucandoisensurethepeoplewhodidtheworkcangiveanopinion.Mostdeveloperscansensewhentheyaremoreproductiveinoneenvironmentthananother.Althoughthisisasubjectivejudgment,andyoumaywellgetdisagreementsbetweenteammembers,thisisthebestjudgmentyouwillget.Intheendwebelievetheteamdoingtheworkshoulddecide.Whentryingoutadatabasetojudgeproductivity,it’simportanttoalsotryoutsomeofthebadfit

caseswementionedearlier.Thatwaytheteamcangetafeelingofboththehappypathandthedifficultone,togainanoverallimpression.Thisapproachhasitsflaws.Oftenyoucan’tgetafullappreciationofatechnologywithout

spendingmanymonthsusingit—andrunninganassessmentforthatlongisrarelycost-effective.Butlikemanythingsinlife,weneedtomakethebestassessmentwecan,knowingitsflaws,andgowiththat.Theessentialthinghereistobasethedecisiononasmuchrealprogrammingasyoucan.Evenamereweekworkingwithatechnologycantellyouthingsyou’dneverlearnfromahundredvendorpresentations.

15.2.Data-AccessPerformanceTheconcernthatledtothegrowthofNoSQLdatabaseswasrapidaccesstolotsofdata.Aslargewebsitesemerged,theywantedtogrowhorizontallyandrunonlargeclusters.TheydevelopedtheearlyNoSQLdatabasestohelpthemrunefficientlyonsucharchitectures.Asotherdatausersfollowtheirlead,againthefocusisonaccessingdatarapidly,oftenwithlargevolumesinvolved.Therearemanyfactorsthatcandetermineadatabase’sbetterperformancethantherelational

defaultinvariouscircumstances.Aaggregate-orienteddatabasemaybeveryfastforreadingorretrievingaggregatescomparedtoarelationaldatabasewheredataisspreadovermanytables.Easiershardingandreplicationoverclustersallowshorizontalscaling.Agraphdatabasecanretrievehighlyconnecteddatamorequicklythanusingrelationaljoins.Ifyou’reinvestigatingNoSQLdatabasesbasedonperformance,themostimportantthingyoumust

doistotesttheirperformanceinthescenariosthatmattertoyou.Reasoningabouthowadatabasemayperformcanhelpyoubuildashortlist,buttheonlywayyoucanassessperformanceproperlyistobuildsomething,runit,andmeasureit.Whenbuildingaperformanceassessment,thehardestthingisoftengettingarealisticsetof

performancetests.Youcan’tbuildyouractualsystem,soyouneedtobuildarepresentativesubset.It’simportant,however,forthissubsettobeasfaithfularepresentativeaspossible.It’snogoodtakingadatabasethat’sintendedtoservehundredsofconcurrentusersandassessingitsperformance

withasingleuser.Youaregoingtoneedtobuildrepresentativeloadsanddatavolumes.Particularlyifyouarebuildingapublicwebsite,itcanbedifficulttobuildahigh-loadtestbed.

Here,agoodargumentcanbemadeforusingcloudcomputingresourcesbothtogenerateloadandtobuildatestcluster.Theelasticnatureofcloudprovisioningisveryhelpfulforshort-livedperformanceassessmentwork.You’renotgoingtobeabletotesteverywayinwhichyourapplicationwillbeused,soyouneedto

buildarepresentativesubset.Choosescenariosthatarethemostcommon,themostperformance-dependent,andthosethatdon’tseemtofityourdatabasemodelwell.Thelattermayalertyoutoanyrisksoutsideofyourmainusecases.Comingupwithvolumestotestforcanbetricky,especiallyearlyoninaprojectwhenit’snot

clearwhatyourproductionvolumesarelikelytobe.Youwillhavetocomeupwithsomethingtobaseyourthinkingon,sobesuretomakeitexplicitandtocommunicateitwithallthestakeholders.Makingitexplicitreducesthechancethatdifferentpeoplehavevaryingideasonwhata“heavyreadload”is.Italsoallowsyoutospotproblemsmoreeasilyshouldyourlaterdiscoverieswanderoffyouroriginalassumptions.Withoutmakingyourassumptionsexplicit,it’seasiertodriftawayfromthemwithoutrealizingyouneedtoredoyourtestbedasyoulearnnewinformation.

15.3.StickingwiththeDefaultNaturallywethinkthatNoSQLisaviableoptioninmanycircumstances—otherwisewewouldn’thavespentseveralmonthswritingthisbook.Butwealsorealizethattherearemanycases,indeedthemajorityofcases,whereyou’rebetteroffstickingwiththedefaultoptionofarelationaldatabase.Relationaldatabasesarewellknown;youcaneasilyfindpeoplewiththeexperienceofusingthem.

Theyaremature,soyouarelesslikelytorunintotheroughedgesofnewtechnology.Therearelotsoftoolsthatarebuiltonrelationaltechnologythatyoucantakeadvantageof.Youalsodon’thavetodealwiththepoliticalissuesofmakinganunusualchoice—pickinganewtechnologywillalwaysintroduceariskofproblemsshouldthingsrunintodifficulties.So,onthewhole,wetendtotakeaviewthattochooseaNoSQLdatabaseyouneedtoshowareal

advantageoverrelationaldatabasesforyoursituation.There’snoshameindoingtheassessmentsforprogrammabilityandperformance,findingnoclearadvantage,andstayingwiththerelationaloption.WethinktherearemanycaseswhereitisadvantageoustouseNoSQLdatabases,but“many”doesnotmean“all”oreven“most.”

15.4.HedgingYourBetsOneofthegreatestdifficultieswehaveingivingadviceonchoosingadata-storageoptionisthatwedon’thavethatmuchdatatogoon.Aswewritethis,weareonlyseeingveryearlyadoptersdiscussingtheirexperienceswiththesetechnologies,sowedon’thaveaclearpictureoftheactualprosandcons.Withthesituationthisuncertain,there’smoreofanargumentforencapsulatingyourdatabase

choice—keepingallyourdatabasecodeinasectionofyourcodebasethatisrelativelyeasytoreplaceshouldyoudecidetochangeyourdatabasechoicelater.Theclassicwaytodothisisthroughanexplicitdatastorelayerinyourapplication—usingpatternssuchasDataMapperandRepository[FowlerPoEAA].Suchanencapsulationlayerdoescarryacost,particularlywhenyouareunsureaboutusingquitedifferentmodels,suchaskey-valueversusgraphdatamodels.Worsestill,wedon’thaveexperienceyetwithencapsulatingdatalayersbetweentheseverydifferentkindsofdatastores.Onthewhole,ouradviceistoencapsulateasadefaultstrategy,butpayattentiontothecostof

insulatinglayer.Ifit’sgettingtoomuchofaburden,forexamplebymakingithardertousesomehelpfuldatabasefeatures,thenit’sagoodargumentforusingthedatabasethathasthosefeatures.Thisinformationmaybejustwhatyouneedtomakeadatabasechoiceandthuseliminatetheencapsulation.Thisisanotherargumentfordecomposingthedatabaselayerintoservicesthatencapsulatedata

storage(“ServiceUsageoverDirectDataStoreUsage,”p.136).Aswellasreducingcouplingbetweenvariousservices,thishastheadditionaladvantageofmakingiteasiertoreplaceadatabaseshouldthingsnotworkoutinthefuture.Thisisaplausibleapproachevenifyouendupusingthesamedatabaseeverywhere—shouldthingsgobadly,youcangraduallyswapitout,focusingonthemostproblematicservicesfirst.Thisdesignadviceappliesjustasmuchifyouprefertostickwitharelationaloption.By

encapsulatingsegmentsofyourdatabaseintoservices,youcanreplacepartsofyourdatastorewithaNoSQLtechnologyasitmaturesandtheadvantagesbecomeclearer.

15.5.KeyPoints•ThetwomainreasonstouseNoSQLtechnologyare:•Toimproveprogrammerproductivitybyusingadatabasethatbettermatchesanapplication’sneeds.

•Toimprovedataaccessperformanceviasomecombinationofhandlinglargerdatavolumes,reducinglatency,andimprovingthroughput.

•It’sessentialtotestyourexpectationsaboutprogrammerproductivityand/orperformancebeforecommittingtousingaNoSQLtechnology.

•Serviceencapsulationsupportschangingdatastoragetechnologiesasneedsandtechnologyevolve.SeparatingpartsofapplicationsintoservicesalsoallowsyoutointroduceNoSQLintoanexistingapplication.

•Mostapplications,particularlynonstrategicones,shouldstickwithrelationaltechnology—atleastuntiltheNoSQLecosystembecomesmoremature.

15.6.FinalThoughtsWehopeyou’vefoundthisbookenlightening.Whenwestartedwritingit,wewerefrustratedbythelackofanythingthatwouldgiveusabroadsurveyoftheNoSQLworld.Inwritingthisbookwehadtomakethatsurveyourselves,andwe’vefounditanenjoyablejourney.Wehopeyourjourneythroughthismaterialisconsiderablyquickerbutnolessenjoyable.AtthispointyoumaybeconsideringmakinguseofaNoSQLtechnology.Ifsothisbookisonlyan

earlystepinbuildingyourunderstanding.Weurgeyoutodownloadsomedatabasesandworkwiththem,forwe’reofthefirmconvictionthatyoucanonlyunderstandatechnologyproperlybyworkingwithit—findingitsstrengthsandtheinevitablegotchasthatnevermakeitintothedocumentation.Weexpectthatmostpeople,includingmostreadersofthisbook,willnotbeusingNoSQLfora

while.Itisanewtechnologyandwearestillearlyintheprocessofunderstandingwhentouseitandhowtouseitwell.Butaswithanythinginthesoftwareworld,thingsarechangingmorerapidlythanwedarepredict,sodokeepaneyeonwhat’shappeninginthisfield.Wehopeyou’llalsofindotherbooksandarticlestohelpyou.WethinkthebestmaterialonNoSQL

willbewrittenafterthisbookisdone,sowecan’tpointyoutoanywhereinparticularaswewrite

this.WedohaveanactivepresenceontheWeb,soforourmoreup-to-datethoughtsontheNoSQLworldtakealookatwww.sadalage.comandhttp://martinfowler.com/nosql.html.

http://www.sadalage.com

http://martinfowler.com/nosql.html

Bibliography

[AgileMethods]www.agilealliance.org.

[Amazon’sDynamo]www.allthingsdistributed.com/2007/10/amazons_dynamo.html.

[AmazonDynamoDB]http://aws.amazon.com/dynamodb.

[AmazonSimpleDB]http://aws.amazon.com/simpledb.

[AmblerandSadalage]Ambler,ScottandPramodkumarSadalage.RefactoringDatabases:EvolutionaryDatabaseDesign.Addison-Wesley.2006.ISBN978-0321293534.

[BerkeleyDB]www.oracle.com/us/products/database/berkeley-db.

[Blueprints]https://github.com/tinkerpop/blueprints/wiki.

[Brewer]Brewer,Eric.TowardsRobustDistributedSystems.www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.

[Cages]http://code.google.com/p/cages.

[Cassandra]http://cassandra.apache.org.

[Changetc.]Chang,Fay,JeffreyDean,SanjayGhemawat,WilsonC.Hsieh,DeborahA.Wallach,MikeBurrows,TusharChandra,AndrewFikes,andRobertE.Gruber.Bigtable:ADistributedStorageSystemforStructuredData.http://research.google.com/archive/bigtable-osdi06.pdf.

[CouchDB]http://couchdb.apache.org.

[CQL]www.slideshare.net/jericevans/cql-sql-in-cassandra.

[CQRS]http://martinfowler.com/bliki/CQRS.html.

[C-Store]Stonebraker,Mike,DanielAbadi,AdamBatkin,XuedongChen,MitchCherniack,MiguelFerreira,EdmondLau,AmersonLin,SamMadden,ElizabethO’Neil,PatO’Neil,AlexRasin,NgaTran,andStanZdonik.C-Store:AColumn-orientedDBMS.http://db.csail.mit.edu/projects/cstore/vldb.pdf.

[Cypher]http://docs.neo4j.org/chunked/1.6.1/cypher-query-lang.html.

[Daigneau]Daigneau,Robert.ServiceDesignPatterns.Addison-Wesley.2012.ISBN032154420X.

[DBDeploy]http://dbdeploy.com.

[DBMaintain]www.dbmaintain.org.

[DeanandGhemawat]Dean,JeffreyandSanjayGhemawat.MapReduce:SimplifiedDataProcessingonLargeClusters.http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf.

[Dijkstra’s]http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm.

[Evans]Evans,Eric.Domain-DrivenDesign.Addison-Wesley.2004.ISBN0321125215.

[FlockDB]https://github.com/twitter/flockdb.

http://www.agilealliance.org

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

http://aws.amazon.com/dynamodb

http://aws.amazon.com/simpledb

http://www.oracle.com/us/products/database/berkeley-db

https://github.com/tinkerpop/blueprints/wiki

http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf

http://code.google.com/p/cages

http://cassandra.apache.org

http://research.google.com/archive/bigtable-osdi06.pdf

http://couchdb.apache.org

http://www.slideshare.net/jericevans/cql-sql-in-cassandra

http://martinfowler.com/bliki/CQRS.html

http://db.csail.mit.edu/projects/cstore/vldb.pdf

http://docs.neo4j.org/chunked/1.6.1/cypher-query-lang.html

http://dbdeploy.com

http://www.dbmaintain.org

http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf

http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

https://github.com/twitter/flockdb

[FowlerDSL]Fowler,Martin.Domain-SpecificLanguages.Addison-Wesley.2010.ISBN0321712943.

[Fowlerlmax]Fowler,Martin.TheLMAXArchitecture.http://martinfowler.com/articles/lmax.html.

[FowlerPoEAA]Fowler,Martin.PatternsofEnterpriseApplicationArchitecture.Addison-Wesley.2003.ISBN0321127420.

[FowlerUML]Fowler,Martin.UMLDistilled.Addison-Wesley.2003.ISBN0321193687.

[Gremlin]https://github.com/tinkerpop/gremlin/wiki.

[Hadoop]http://hadoop.apache.org/mapreduce.

[HamsterDB]http://hamsterdb.com.

[Hbase]http://hbase.apache.org.

[Hector]https://github.com/rantav/hector.

[Hive]http://hive.apache.org.

[HohpeandWoolf]Hohpe,GregorandBobbyWoolf.EnterpriseIntegrationPatterns.Addison-Wesley.2003.ISBN0321200683.

[HTTP]Fielding,R.,J.Gettys,J.Mogul,H.Frystyk,L.Masinter,P.Leach,andT.Berners-Lee.HypertextTransferProtocol—HTTP/1.1.www.w3.org/Protocols/rfc2616/rfc2616.html.

[Hypertable]http://hypertable.org.

[InfiniteGraph]www.infinitegraph.com.

[JSON]http://json.org.

[LevelDB]http://code.google.com/p/leveldb.

[Liquibase]www.liquibase.org.

[Lucene]http://lucene.apache.org.

[LynchandGilbert]Lynch,NancyandSethGilbert.Brewer’sconjectureandthefeasibilityofconsistent,available,partition-tolerantwebservices.http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf.

[Memcached]http://memcached.org.

[MongoDB]www.mongodb.org.

[Monitoring]www.mongodb.org/display/DOCS/MongoDB+Monitoring+Service.

[MyBatisMigrator]http://mybatis.org.

[Neo4J]http://neo4j.org.

[NoSQLDebrief]http://blog.oskarsson.nu/post/22996140866/nosql-debrief.

[NoSQLMeetup]http://nosql.eventbrite.com.

http://martinfowler.com/articles/lmax.html

https://github.com/tinkerpop/gremlin/wiki

http://hadoop.apache.org/mapreduce

http://hamsterdb.com

http://hbase.apache.org

https://github.com/rantav/hector

http://hive.apache.org

http://www.w3.org/Protocols/rfc2616/rfc2616.html

http://hypertable.org

http://www.infinitegraph.com

http://json.org

http://code.google.com/p/leveldb

http://www.liquibase.org

http://lucene.apache.org

http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf

http://memcached.org

http://www.mongodb.org

http://www.mongodb.org/display/DOCS/MongoDB+Monitoring+Service

http://mybatis.org

http://neo4j.org

http://blog.oskarsson.nu/post/22996140866/nosql-debrief

http://nosql.eventbrite.com

[NotesStorageFacility]http://en.wikipedia.org/wiki/IBM_Lotus_Domino.

[OpsCenter]www.datastax.com/products/opscenter.

[OrientDB]www.orientdb.org.

[Oskarsson]PrivateCorrespondence.

[Pentaho]www.pentaho.com.

[Pig]http://pig.apache.org.

[Pritchett]www.infoq.com/interviews/dan-pritchett-ebay-architecture.

[ProjectVoldemort]http://project-voldemort.com.

[RavenDB]http://ravendb.net.

[Redis]http://redis.io.

[Rekon]https://github.com/basho/rekon.

[Riak]http://wiki.basho.com/Riak.html.

[Solr]http://lucene.apache.org/solr.

[StrozziNoSQL]www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/NoSQL.

[TanenbaumandVanSteen]Tanenbaum,AndrewandMaartenVanSteen.DistributedSystems.Prentice-Hall.2007.ISBN0132392275.

[Terrastore]http://code.google.com/p/terrastore.

[Vogels]Vogels,Werner.EventuallyConsistent—Revisited.www.allthingsdistributed.com/2008/12/eventually_consistent.html.

[WebberNeo4JScaling]http://jim.webber.name/2011/03/22/ef4748c3-6459-40b6-bcfa-818960150e0f.aspx.

[ZooKeeper]http://zookeeper.apache.org.

http://en.wikipedia.org/wiki/IBM_Lotus_Domino

http://www.datastax.com/products/opscenter

http://www.orientdb.org

http://www.pentaho.com

http://pig.apache.org

http://www.infoq.com/interviews/dan-pritchett-ebay-architecture

http://project-voldemort.com

http://ravendb.net

http://redis.io

https://github.com/basho/rekon

http://wiki.basho.com/Riak.html

http://lucene.apache.org/solr

http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/NoSQL

http://code.google.com/p/terrastore

http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

http://jim.webber.name/2011/03/22/ef4748c3-6459-40b6-bcfa-818960150e0f.aspx

http://zookeeper.apache.org

Index

AACID(Atomic,Consistent,Isolated,andDurable)transactions,19

incolumn-familydatabases,109ingraphdatabases,28,50,114–115inrelationaldatabases,10,26vs.BASE,56

adbanners,108–109aggregate-orienteddatabases,14,19–23,147

atomicupdatesin,50,61disadvantagesof,30noACIDtransactionsin,50performanceof,149vs.graphdatabases,28

aggregates,14–23changingstructureof,98,132modeling,31real-timeanalyticswith,33updating,26

agilemethods,123Amazon,9

SeealsoDynamoDB,SimpleDBanalytics

countingwebsitevisitorsfor,108ofhistoricinformation,144real-time,33,98

ApachePiglanguage,76ApacheZooKeeperlibrary,104,115applicationdatabases,7,146

updatingmaterializedviewsin,31arcs(graphdatabases).Seeedgesatomiccross-documentoperations,98atomicrebalancing,58atomictransactions,92,104atomicupdates,50,61automatedfailovers,94automatedmerges,48automatedrollbacks,145auto-sharding,39availability,53

incolumn-familydatabases,104–105indocumentdatabases,93ingraphdatabases,115vs.consistency,54SeealsoCAPtheorem

averages,calculating,72

Bbackwardcompatibility,126,131BASE(BasicallyAvailable,Softstate,Eventualconsistency),56BerkeleyDB,81BigTableDB,9,21–22bit-mappedindexes,106blogging,108Blueprintspropertygraph,115Brewer,Eric,53Brewer ’sConjecture.SeeCAPtheorembuckets(Riak),82

defaultvaluesforconsistencyfor,84domain,83storingalldatatogetherin,82

businesstransactions,61

Ccaching

performanceof,39,137staledatain,50

Cageslibrary,104CAP(Consistency,Availability,andPartitiontolerance)theorem,53–56

fordocumentdatabases,93forRiak,86

CAS(compare-and-set)operations,62CassandraDB,10,21–22,99–109

availabilityin,104–105columnfamiliesin:commandsfor,105–106standard,101super,101–102

columnsin,100expiring,108–109indexing,106–107reading,107

super,101compactionin,103consistencyin,103–104ETLtoolsfor,139hintedhandoffin,104keyspacesin,102–104memtablesin,103queriesin,105–107repairsin,103–104replicationfactorin,103scalingin,107SSTablesin,103timestampsin,100transactionsin,104wide/skinnyrowsin,23

clients,processingon,67Clojurelanguage,145cloudcomputing,149clumping,39clusters,8–10,67–72,76,149

infilesystems,8inRiak,87resiliencyof,8

column-familydatabases,21–23,99–109ACIDtransactionsin,109columnsformaterializedviewsin,31combiningpeer-to-peerreplicationandshardingin,43–44consistencyin,103–104modelingfor,34performancein,103schemalessnessof,28vs.key-valuedatabases,21wide/skinnyrowsin,23

combinablereducers,70–71compaction(Cassandra),103compatibility,backward,126,131concurrency,145

infilesystems,141inrelationaldatabases,4offline,62

conditionalupdates,48,62–63conflicts

key,82read-write,49–50resolving,64write-write,47–48,64

consistency,47–59eventual,50,84incolumn-familydatabases,103–104ingraphdatabases,114inmaster-slavereplication,52inMongoDB,91logical,50optimistic/pessimistic,48read,49–52,56read-your-writes,52relaxing,52–56replication,50session,52,63tradingoff,57update,47,56,61vs.availability,54write,92SeealsoCAPtheorem

contenthashes,62–63contentmanagementsystems,98,108CouchDB,10,91

conditionalupdatesin,63replicasetsin,94

counters,forversionstamps,62–63CQL(CassandraQueryLanguage),10,106CQRS(CommandQueryResponsibilitySegregation),143cross-documentoperations,98C-StoreDB,21Cypherlanguage,115–119

DDataMapperandRepositorypattern,151datamodels,13,25

aggregate-oriented,14–23,30document,20key-value,20relational,13–14

dataredundancy,94

databaseschoosing,7,147–152deploying,139encapsulatinginexplicitlayer,151NoSQL,definitionof,10–11sharedintegrationof,4,6

DatastaxOpsCenter,139DBDeployframework,125DBMaintaintool,126deadlocks,48demoaccess,108DependencyNetworkpattern,77deploymentcomplexity,139Dijkstra’salgorithm,118disasterrecovery,94distributedfilesystems,76,141distributedversioncontrolsystems,48

versionstampsin,64distributionmodels,37–43

Seealsoreplications,sharding,singleserverapproachdocumentdatabases,20,23,89–98

availabilityin,93embeddingchilddocumentsinto,90indexesin,25master-slavereplicationin,93performancein,91queriesin,25,94–95replicasetsin,94scalingin,95schemalessnessof,28,98XMLsupportin,146

domainbuckets(Riak),83Domain-DrivenDesign,14DTDs(DocumentTypeDefinitions),146durability,56–57DynamoDB,9,81,100

shoppingcartsin,55DynomiteDB,10

Eearlyprototypes,109e-commerce

datamodelingfor,14flexibleschemasfor,98polyglotpersistenceof,133–138shoppingcartsin,55,85,87

edges(graphdatabases),26,111eligibilityrules,26enterprises

commercialsupportofNoSQLfor,138–139concurrencyin,4DBasbackingstorefor,4eventloggingin,97integrationin,4polyglotpersistencein,138–139securityofdatain,139

errorhandling,4,145etags,62ETLtools,139Evans,Eric,10eventlogging,97,107–108eventsourcing,138,142,144eventualconsistency,50

inRiak,84expiringusage,108–109

Ffailovers,automated,94filesystems,141

asbackingstoreforRDBMS,3cluster-aware,8concurrencyin,141distributed,76,141performanceof,141queriesin,141

FlockDB,113datamodelof,27nodedistributionin,115

GGilbert,Seth,53Google,9

GoogleBigTable.SeeBigTableGoogleFileSystem,141

graphdatabases,26–28,111–121,148ACIDtransactionsin,28,50,114–115aggregate-ignoranceof,19availabilityin,115consistencyin,114creating,113edges(arcs)in,26,111heldentirelyinmemory,119master-slavereplicationin,115migrationsin,131modelingfor,35nodesin,26,111–117performanceof,149propertiesin,111queriesin,115–119relationshipsin,111–121scalingin,119schemalessnessof,28singleserverconfigurationof,38traversing,111–117vs.aggregatedatabases,28vs.relationaldatabases,27,112wrappingintoservice,136

Gremlinlanguage,115GUID(GloballyUniqueIdentifier),62

HHadoopproject,67,76,141HamsterDB,81hashtables,62–63,81HBaseDB,10,21–22,99–100Hectorclient,105Hibernateframework,5,147hintedhandoff,104hiveDB,76hotbackup,40,42hotelbooking,4,55HTTP(HypertextTransferProtocol),7

interfacesbasedon,85updatingwith,62

HypertableDB,10,99–100

I

iBATIS,5,147impedancemismatch,5,12inconsistency

inshoppingcarts,55ofreads,49ofupdates,56windowof,50–51,56

indexesbit-mapped,106indocumentdatabases,25staledatain,138updating,138

InfiniteGraphDB,113datamodelof,27nodedistributionin,114–115

initialtechspikes,109integrationdatabases,6,11interoperability,7

JJSON(JavaScriptObjectNotation),7,94–95,146

Kkeys(key-valuedatabases)

composite,74conflictsof,82designing,85expiring,85groupingintopartitions,70

keyspaces(Cassandra),102–104key-valuedatabases,20,23,81–88

consistencyof,83–84modelingfor,31–33nomultiplekeyoperationsin,88schemalessnessof,28shardingin,86structureofvaluesin,86transactionsin,84,88vs.column-familydatabases,21XMLsupportin,146

LLiquibasetool,126

location-basedservices,120locks

dead,48offline,52

lostupdates,47LotusDB,91Lucenelibrary,85,88,116Lynch,Nancy,53

MMapReduceframework,67map-reducepattern,67–77

calculationswith,72incremental,31,76–77mapsin,68materializedviewsin,76partitionsin,70reusingintermediateoutputsin,76stagesfor,73–76

master-slavereplication,40–42appointingmastersin,41,57combiningwithsharding,43consistencyof,52indocumentdatabases,93ingraphdatabases,115versionstampsin,63

materializedviews,30inmap-reduce,76updating,31

MemcachedDB,81,87memoryimages,144–145memtables(Cassandra),103merges,automated,48MicrosoftSQLServer,8migrations,123–132

duringdevelopment,124,126ingraphdatabases,131inlegacyprojects,126–128inobject-orienteddatabases,146inschemalessdatabases,128–132incremental,130transitionphaseof,126–128

mobileapps,131MongoDB,10,91–97

collectionsin,91consistencyin,91databasesin,91ETLtoolsfor,139queriesin,94–95replicasetsin,91,93,96schemamigrationsin,128–131shardingin,96slaveOkparameterin,91–92,96terminologyin,89WriteConcernparameterin,92

MongoDBMonitoringService,139MyBatisMigratortool,126MySQLDB,53,119

NNeo4JDB,113–118

ACIDtransactionsin,114–115availabilityin,115creatinggraphsin,113datamodelof,27replicatedslavesin,115servicewrappingin,136

nodes(graphdatabases),26,111distributedstoragefor,114findingpathsbetween,117indexingpropertiesof,115–116

nonuniformdata,10,28,30NoSQLdatabases

advantagesof,12definitionof,10–11lackofsupportfortransactionsin,10,61runningofclusters,10schemalessnessof,10

Oobject-orienteddatabases,5,146

migrationsin,146vs.relationaldatabases,6

offlineconcurrency,62

offlinelocks,52OptimisticOfflineLock,62OracleDB

redologin,104terminologyin,81,89

OracleRACDB,8OrientDB,91,113ORM(Object-RelationalMapping)frameworks,5–6,147Oskarsson,Johan,9

Ppartitiontolerance,53–54

SeealsoCAPtheorempartitioning,69–70peer-to-peerreplication,42–43

durabilityof,58inconsistencyof,43versionstampsin,63–64

Pentahotool,139performance

andsharding,39andtransactions,53binaryprotocolsfor,7cachingfor,39,137data-access,149–150inaggregate-orienteddatabases,149incolumn-familydatabases,103indocumentdatabases,91ingraphdatabases,149responsivenessof,48testsfor,149

pipes-and-filtersapproach,73polyglotpersistence,11,133–139,148

anddeploymentcomplexity,139inenterprises,138–139

polyglotprogramming,133–134processing,onclients/servers,67programmerproductivity,147–149purchaseorders,25

Qqueries

againstvaryingaggregatestructure,98bydata,88,94bykey,84–86forfiles,141incolumn-familydatabases,105–107indocumentdatabases,25,94–95ingraphdatabases,115–119precomputedandcached,31viaviews,94

quorums,57,59read,58write,58,84

RRailsActiveRecordframework,147RavenDB,91

atomiccross-documentoperationsin,98replicasetsin,94transactionsin,92

RDBMS.Seerelationaldatabasesreads

consistencyof,49–52,56,58horizontalscalingfor,94,96inconsistent,49multiplenodesfor,143performanceof,52quorumsof,58repairsof,103resilienceof,40–41separatingfromwrites,41stale,56

read-writeconflicts,49–50read-your-writesconsistency,52RealTimeAnalytics,33RealTimeBI,33rebalancing,atomic,58recommendationengines,26,35,121,138RedisDB,81–83redolog,104reducefunctions,69

combinable,70–71regions.Seemap-reducepattern,partitionsin

RekonbrowserforRiak,139relationaldatabases(RDBMS),13,17

advantagesof,3–5,7–8,150aggregate-ignoranceof,19backingstorein,3clustered,8columnsin,13,90concurrencyin,4definingschemasfor,28impedancemismatchin,5,12licensingcostsof,8mainmemoryin,3modifyingmultiplerecordsatoncein,26partitionsin,96persistencein,3relations(tables)in,5,13schemasfor,29–30,123–128securityin,7shardingin,8simplicityofrelationshipsin,112strongconsistencyof,47terminologyin,81,89transactionsin,4,26,92tuples(rows)in,5,13–14viewsin,30vs.graphdatabases,27,112vs.object-orienteddatabases,6XMLsupportin,146

relationships,25,111–121dangling,114directionof,113,116,118inRDBMS,112propertiesof,113–115traversing,111–117

RelaxNG,146replicasets,91,93,96replicationfactor,58

incolumn-familydatabases,103inRiak,84

replications,37combiningwithsharding,43consistencyof,42,50

durabilityof,57overclusters,149performanceof,39versionstampsin,63–64Seealsomaster-slavereplication,peer-to-peerreplication

resilienceandsharding,39read,40–41

responsiveness,48RiakDB,81–83

clustersin,87controllingCAPin,86eventualconsistencyin,84HTTP-basedinterfaceof,85link-walkingin,25partialretrievalin,25replicationfactorin,84servicewrappingin,136terminologyin,81transactionsin,84writetoleranceof,84

RiakSearch,85,88richdomainmodel,113rollbacks,automated,145routing,120rows(RDBMS).Seetuples

Sscaffoldingcode,126scaling,95

horizontal,149forreads,94,96forwrites,96

incolumn-familydatabases,107indocumentdatabases,95ingraphdatabases,119vertical,8

Scatter-Gatherpattern,67schemalessdatabases,28–30,148

implicitschemaof,29schemachangesin,128–132

schemas

backwardcompatibilityof,126,131changing,128–132duringdevelopment,124,126implicit,29migrationsof,123–132

searchengines,138security,139servers

maintenanceof,94processingon,67

service-orientedarchitecture,7services,136

andsecurity,139decomposingdatabaselayerinto,151decouplingbetweendatabasesand,7overHTTP,7

sessionsaffinity,52consistencyof,52,63expirekeysfor,85managementof,133sticky,52storing,57,87

sharding,37–38,40,149andperformance,39andresilience,39auto,39bycustomerlocation,97combiningwithreplication,43inkey-valuedatabases,86inMongoDB,96inrelationaldatabases,8

shareddatabaseintegration,4,6shoppingcarts

expirekeysfor,85inconsistencyin,55persistenceof,133storing,87

shuffling,70SimpleDB,99

inconsistencywindowof,50singleserverapproach,37–38

consistencyof,53nopartitiontolerancein,54transactionsin,53versionstampsin,63

single-threadedeventprocessors,145snapshots,142–143socialnetworks,26,120

relationshipsbetweennodesin,117Solrindexingengine,88,137,141splitbrainsituation,53SQL(StructuredQueryLanguage),5SSTables(Cassandra),103staledata

incache,50inindexes/searchengines,138reading,56

standardcolumnfamilies(Cassandra),101stickysessions,52storagemodels,13Strozzi,Carlo,9supercolumnfamilies(Cassandra),101–102supercolumns(Cassandra),101systemtransactions,61

Ttables.Seerelationaldatabases,relationsintelemetricdatafromphysicaldevices,57TerrastoreDB,91,94timestamps

consistentnotionoftimefor,64incolumn-familydatabases,100oflastupdate,63

transactionalmemorysystems,145transactions,50

ACID,10,19,26,28,50,56,109,114–115acrossmultipleoperations,92andperformance,53atomic,92,104business,61ingraphdatabases,28,114–115inkey-valuedatabases,84,88inRDBMS,4,26,92

insingleserversystems,53lackofsupportinNoSQLfor,10,61multioperation,88openduringuserinteraction,52rollingback,4system,61

treestructures,117triggers,126TTL(TimeToLive),108–109tuples(RDBMS),5,13–14

Uupdates

atomic,50,61conditional,48,62–63consistencyof,47,56,61lost,47merging,48timestampsof,63–64

usercomments,98userpreferences,87userprofiles,87,98userregistrations,98usersessions,57

Vvectorclock,64versioncontrolsystems,126,145

distributed,48,64versionstamps,52,61–64versionvector,64views,126virtualcolumns,126VoldemortDB,10,82

Wwebservices,7websites

distributingpagesfor,39onlargeclusters,149publishing,98visitorcountersfor,108

wordprocessors,3

writetolerance,84writes,64

atomic,104conflictsof,47–48consistencyof,92horizontalscalingfor,96performanceof,91quorumsof,58separatingfromreads,41serializing,47

XXML(ExtensibleMarkupLanguage),7,146XMLdatabases,145–146XMLSchemalanguage,146XPathlanguage,146XQuerylanguage,146XSLT(ExtensibleStylesheetLanguageTransformations),146

ZZooKeeper.SeeApacheZooKeeper