![Page 1: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/1.jpg)
CompSci 516DataIntensiveComputingSystems
Lecture18NoSQLand
ColumnStore
Instructor:Sudeepa Roy
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
1
![Page 2: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/2.jpg)
Announcements
• HW3(lastHW)hasbeenpostedonSakai
• SameproblemsasinHW1butinMongoDB(NOSQL)
• Dueintwoweeksaftertoday’slecture(~11/16)
2DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
![Page 3: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/3.jpg)
ReadingMaterialNOSQL:• “ScalableSQLandNoSQLDataStores”RickCattell,SIGMODRecord,December2010(Vol.39,No.4)• seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers
ColumnStore:• D.Abadi,P.Boncz,S.Harizopoulos,S.Idreos andS.Madden.TheDesignandImplementationof
ModernColumn-OrientedDatabaseSystems.FoundationsandTrendsinDatabases,vol.5,no.3,pp.197–280,2012.
• SeeVLDB2009tutorial:http://nms.csail.mit.edu/~stavros/pubs/tutorial2009-column_stores.pdf
Optional:• “Dynamo:Amazon’sHighlyAvailableKey-valueStore”ByGiuseppeDeCandia et.al.SOSP
2007
• “Bigtable:ADistributedStorageSystemforStructuredData”FayChanget.al.OSDI2006
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 3
![Page 4: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/4.jpg)
NoSQL
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 4
![Page 5: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/5.jpg)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 5
![Page 6: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/6.jpg)
Sofar-- RDBMS
• RelationalDataModel• RelationalDatabaseSystems(RDBMS)• RDBMSshave– acompletepre-definedfixedschema– aSQLinterface– andACIDtransactions
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 6
![Page 7: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/7.jpg)
Today• NoSQL:”new”databasesystems– nottypicallyRDBMS– relaxonsomerequirements,gainefficiencyandscalability
• Newsystemschoosetouse/notuseseveralconceptswelearntsofar– e.g.SystemXdoesnotuselocksbutusemulti-versionCC(MVCC)or,
– SystemYusesasynchronousreplication• therefore,itisimportanttounderstandthebasics(Lectures1-17)eveniftheyarenotusedinsomenewsystems!
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 7
![Page 8: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/8.jpg)
Warnings!
• MaterialfromCattell’spaper(2010-11)–someinfowillbeoutdated– seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers
• WewillfocusonthebasicideasofNoSQLsystems
• Optional readingslidesattheend– therearealsocomparisontablesintheCattell’spaperifyouareinterested
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 8
![Page 9: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/9.jpg)
OLAPvs.OLTP
• OLTP(OnLine TransactionProcessing)– Recalltransactions!– Multipleconcurrentread-writerequests– Commercialapplications(banking,onlineshopping)– Datachangesfrequently– ACIDproperties,concurrencycontrol,recovery
• OLAP(OnLine AnalyticalProcessing)– Manyaggregate/group-byqueries– multidimensionaldata– Datamostlystatic– WillstudyOLAPCubesoon
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 9
![Page 10: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/10.jpg)
NewSystems• WewillexamineanumberofSQLandso- called“NoSQL”systemsor“datastores”
• DesignedtoscalesimpleOLTP-styleapplicationloads– todoupdatesaswellasreads– incontrasttotraditionalDBMSsanddatawarehouses– toprovidegoodhorizontalscalabilityforsimpleread/writedatabaseoperationsdistributedovermanyservers
• OriginallymotivatedbyWeb2.0applications– thesesystemsaredesignedtoscaletothousandsormillionsofusers
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 10
![Page 11: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/11.jpg)
NewSystemsvs.RDMS• Whenyoustudyanewsystem,compareitwithRDBMS-sonits– datamodel– consistencymechanisms– storagemechanisms– durabilityguarantees– availability– querysupport
• Thesesystemstypicallysacrificesomeofthesedimensions– e.g.database-widetransactionconsistency,inordertoachieveothers,e.g.higheravailabilityandscalability
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 11
![Page 12: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/12.jpg)
NoSQL
• Manyofthenewsystemsarereferredtoas“NoSQL”datastores
• NoSQLstandsfor“NotOnlySQL”or“NotRelational”– notentirelyagreedupon
• Next:sixkeyfeaturesofNoSQLsystems
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 12
![Page 13: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/13.jpg)
NoSQL:SixKeyFeatures
1. theabilitytohorizontallyscale“simpleoperations”throughputovermanyservers
2. theabilitytoreplicateandtodistribute(partition)dataovermanyservers
3. asimplecalllevelinterfaceorprotocol(incontrasttoSQLbinding)
4. aweakerconcurrencymodelthantheACIDtransactionsofmostrelational(SQL)databasesystems
5. efficientuseofdistributedindexesandRAMfordatastorage6. theabilitytodynamicallyaddnewattributestodatarecords
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 13
![Page 14: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/14.jpg)
ImportantExamplesofNewSystems
• Threesystemsprovideda“proofofconcept”andinspiredmanyotherdatastores
1. Memcached2. Amazon’sDynamo3. Google’sBigTable
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 14
![Page 15: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/15.jpg)
1.Memcached:mainfeatures
• popularopensourcecache
• supportsdistributedhashing(later)
• demonstratedthatin-memoryindexes canbehighlyscalable,distributing andreplicatingobjectsovermultiplenodes
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 15
![Page 16: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/16.jpg)
2.Dynamo:mainfeatures
• pioneeredtheideaofeventualconsistencyasawaytoachievehigheravailabilityandscalability
• datafetchedarenotguaranteedtobeup-to-date
• butupdatesareguaranteedtobepropagatedtoallnodeseventually
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 16
![Page 17: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/17.jpg)
3.BigTable :mainfeatures
• demonstratedthatpersistentrecordstoragecouldbescaledtothousandsofnodes
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 17
![Page 18: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/18.jpg)
BASE(notACIDJ)
• RecallACIDforRDBMSdesiredpropertiesoftransactions:– Atomicity,Consistency,Isolation,andDurability
• NOSQLsystemstypicallydonotprovideACID
• BasicallyAvailable• Softstate• Eventuallyconsistent
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 18
![Page 19: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/19.jpg)
ACIDvs.BASE• TheideaisthatbygivingupACIDconstraints,onecanachievemuchhigherperformanceandscalability
• Thesystemsdifferinhowmuchtheygiveup– e.g.mostofthesystemscallthemselves“eventuallyconsistent”,meaningthatupdatesareeventuallypropagatedtoallnodes
– butmanyofthemprovidemechanismsforsomedegreeofconsistency,suchasmulti-versionconcurrencycontrol(MVCC)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 19
![Page 20: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/20.jpg)
“CAP”Theorem
• OftenEricBrewer’sCAPtheoremcitedforNoSQL
• A systemcanhaveonlytwooutofthreeofthefollowingproperties:– Consistency,– Availability– Partition-tolerance
• TheNoSQLsystemsgenerallygiveupconsistency– However,thetrade-offsarecomplex
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 20
![Page 21: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/21.jpg)
TwofociforNoSQLsystems
1. “Simple”operations
2. HorizontalScalability
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 21
![Page 22: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/22.jpg)
1.“Simple”Operations
• Readingorwritingasmallnumberofrelatedrecordsineachoperation– e.g.keylookups– readsandwritesofonerecordorasmallnumberofrecords
• Thisisincontrasttocomplexqueries,joins,orread-mostlyaccess
• Inspiredbyweb,wheremillionsofusersmaybothreadandwritedatainsimpleoperations– e.g.searchandupdatemulti-serverdatabasesofelectronic
mail,personalprofiles,webpostings,wikis,customerrecords,onlinedatingrecords,classifiedads,andmanyotherkindsofdata
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 22
![Page 23: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/23.jpg)
2.HorizontalScalability
• Shared-NothingHorizontalScaling
• Theabilitytodistributeboththedataandtheloadofthesesimpleoperationsovermanyservers– withnoRAMordisksharedamongtheservers
• Not“vertical”scaling– whereadatabasesystemutilizesmanycoresand/orCPUsthatshareRAManddisks
• Someofthesystemswedescribeprovidebothverticalandhorizontalscalability
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 23
![Page 24: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/24.jpg)
2.Horizontalvs.VerticalScaling
• Effectiveuseofmultiplecores(verticalscaling)isimportant– butthenumberofcoresthatcansharememoryislimited
• horizontalscalinggenerallyislessexpensive– canusecommodityservers
• Note:horizontalandverticalpartitioningarenotrelatedtohorizontalandverticalscaling– exceptthattheyarebothusefulforhorizontalscaling(Lecture17)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 24
![Page 25: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/25.jpg)
WhatisdifferentinNOSQLsystems
• WhenyoustudyanewNOSQLsystem,noticehowitdiffersfromRDBMSintermsof
1. ConcurrencyControl2. DataStorageMedium3. Replication4. Transactions
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 25
![Page 26: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/26.jpg)
ChoicesinNOSQLsystems:1.ConcurrencyControl
a) Locks– somesystemsprovideone-user-at-a-timereadorupdatelocks– MongoDBprovideslockingatafieldlevel
b) MVCCc) None– donotprovideatomicity– multipleuserscaneditinparallel– noguaranteewhichversionyouwillread
d) ACID– pre-analyzetransactionstoavoidconflicts– nodeadlocksandnowaitsonlocks
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 26
![Page 27: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/27.jpg)
ChoicesinNOSQLsystems:2.DataStorageMedium
a) StorageinRAM– snapshotsorreplicationtodisk– poorperformancewhenoverflowsRAM
b) Diskstorage– cachinginRAM
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 27
![Page 28: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/28.jpg)
ChoicesinNOSQLsystems:3.Replication
• whethermirrorcopiesarealwaysinsynca) Synchronousb) Asynchronous– faster,butupdatesmaybelostinacrash
c) Both– localcopiessynchronously,remotecopies
asynchronously
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 28
![Page 29: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/29.jpg)
ChoicesinNOSQLsystems:4.TransactionMechanisms
a) supportb) donotsupportc) inbetween– supportlocaltransactionsonlywithinasingle
objector“shard”– shard=ahorizontalpartitionofdataina
database
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 29
![Page 30: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/30.jpg)
ComparisonfromCattell’spaper(2011)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 30
![Page 31: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/31.jpg)
DataModelTerminologyforNoSQL
• UnlikeSQL/RDBMS,theterminologyforNoSQLisofteninconsistent– wearefollowingnotationsinCattell’spaper
• Allsystemsprovideawaytostorescalarvalues– e.g.numbersandstrings
• Someofthemalsoprovideawaytostoremorecomplexnestedorreferencevalues
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 31
![Page 32: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/32.jpg)
DataModelTerminologyforNoSQL
• Thesystemsallstoresetsofattribute-valuepairs– butusefourdifferentdatastructures
1. Tuple2. Document3. ExtensibleRecord4. Object
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 32
![Page 33: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/33.jpg)
1.Tuple
• Sameasbefore• A“tuple”isarowinarelationaltable– attributenamesarepre-definedinaschema– thevaluesmustbescalar– thevaluesarereferencedbyattributename– incontrasttoanarrayorlist,wheretheyarereferencedbyordinalposition
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 33
![Page 34: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/34.jpg)
2.Document
• Allowsvaluestobenesteddocumentsorlistsaswellasscalarvalues
• Theattributenamesaredynamicallydefinedforeachdocumentatruntime
• Adocumentdiffersfromatupleinthattheattributesarenotdefinedinaglobalschema– anda widerrangeofvaluesarepermitted
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 34
![Page 35: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/35.jpg)
3.ExtensibleRecord
• A hybrid betweenatupleandadocument• familiesofattributesaredefinedinaschema• butnewattributescanbeadded(withinanattributefamily)onaper-recordbasis
• Attributesmaybelist-valued
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 35
![Page 36: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/36.jpg)
4.Object
• Analogoustoanobjectinprogramminglanguages– butwithouttheproceduralmethods
• Valuesmaybereferencesornestedobjects
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 36
![Page 37: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/37.jpg)
DataStoreCategories• Thedatastoresaregroupedaccordingtotheirdatamodel• Key-valueStores:
– storevaluesandanindextofindthem– basedonaprogrammer- definedkey
• DocumentStores:– storedocuments– Thedocumentsareindexedandasimplequerymechanismis
provided• ExtensibleRecordStores:
– storeextensiblerecordsthatcanbepartitionedverticallyandhorizontallyacrossnodes
– Somepaperscallthese“widecolumnstores”• RelationalDatabases:
– store(andindexandquery)tuples– e.g.thenewRDBMSsthatprovidehorizontalscaling
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 37
![Page 38: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/38.jpg)
ExampleNOSQLsystems
• Key-valueStores:– ProjectVoldemort,Riak,Redis,Scalaris,TokyoCabinet,Memcached/Membrain/Membase
• DocumentStores:– AmazonSimpleDB,CouchDB,MongoDB,Terrastore
• ExtensibleRecordStores:– Hbase,HyperTable,Cassandra,Yahoo’sPNUTS
• RelationalDatabases:– MySQLCluster,VoltDB,Clustrix,ScaleDB,ScaleBase,NimbusDB,GoogleMegastore(alayeronBigTable)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 38
![Page 39: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/39.jpg)
Key-valuestore:1/2• Thesimplestdatastores• datamodelsimilartothememcached distributedin-memorycache– withasinglekey-valueindexforallthedata– doesnotprovidesecondaryindicesorkeys
• butunlikememcached,generallyprovide– apersistencemechanism– additionalfunctionalitylike replication,versioning,locking,transactions,sorting,etc
• Theclientinterfaceprovidesinserts,deletes,andindexlookups
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 39
![Page 40: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/40.jpg)
Key-valuestore:2/2• Allkey-valuestoresprovidescalabilitythroughkey
distributionovernodes• Voldemort,Riak,TokyoCabinet,andenhancedmemcached
systemscanstoredatainRAMorondisk– TheothersstoredatainRAM,andprovidediskasbackup,orrely
onreplicationandrecoverysothatabackupisnotneeded• Scalaris andenhancedmemcached systemsusesynchronous
replication– therestuseasynchronous
• Scalaris andTokyoCabinetimplementtransactions– theothersdonot.
• VoldemortandRiak usemulti-versionconcurrencycontrol– theothersuselocks
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 40
![Page 41: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/41.jpg)
UseCase:Key-valuestore
• ifyouhaveasimpleapplicationwithonlyonekindofobject,andyou onlyneedtolookupobjectsupbasedononeattribute
• Supposeyouhaveawebapplication– thatdoesmanyRDBMSqueriestocreateatailoredpagewhenauserlogsin
– Supposeittakesseveralsecondstoexecutethosequeries,andtheuser’sdataisrarelychanged
– youmightwanttostoretheuser’stailoredpageasasingleobjectinakey-valuestore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 41
![Page 42: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/42.jpg)
Documentstore:1/3• Documentstoressupportmorecomplexdatathanthekey-valuestores
• “documentstore”maybeconfusing– thesesystemscouldstore“documents”inthetraditionalsense(articles,MicrosoftWordfiles,etc.)
– butadocumentinthesesystemscanbeanykindof“pointerless object”
• Unlikethekey-valuestores,thesesystemsgenerallysupport– secondaryindexes– multipletypesofdocuments(objects)perdatabase,and– nesteddocumentsorlists
• LikeotherNoSQLsystems,thedocumentstoresdonotprovideACIDtransactionalproperties
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 42
![Page 43: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/43.jpg)
Documentstore:2/3• Thedocumentstoresareschema-less,exceptfor
– attributes(whicharesimplyaname,andarenotpre- specified)– collections(whicharesimplyagroupingofdocuments),and– indexesdefinedoncollections(explicitlydefined,exceptinSimpleDB)– Therearesomedifferencesintheirdatamodels,e.g.SimpleDB does
notallownesteddocuments
• Thedocumentstoresareverysimilarbutusedifferentterminology– e.g.aSimpleDB Domain=CouchDB Database=MongoDBCollection
(=Terrastore Bucket)– SimpleDB callsdocuments“items”– anattributeisafieldinCouchDB,orakeyinMongoDB(orTerrastore)
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 43
![Page 44: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/44.jpg)
Documentstore:3/3• Unlikethekey-valuestores,thedocumentstores“typically”provideamechanismtoquerycollectionsbasedonmultipleattributevalueconstraints
• donotprovideexplicitlocks– haveweakerconcurrencyandatomicitypropertiesthantraditionalACID-compliantdatabases
• Documentscanbedistributedovernodesinallofthesystems– Allofthesystemscanachievescalabilitybyreading(potentially)out-of-datereplicas
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 44
![Page 45: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/45.jpg)
Usecase:DocumentStore
• applicationwithmultipledifferentkindsofobjects– e.g.inaDepartmentofMotorVehiclesapplication,withvehiclesanddrivers
• whereyouneedtolookupobjectsbasedonmultiplefields– e.g.,adriver’sname,licensenumber,ownedvehicle,orbirthdate
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 45
![Page 46: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/46.jpg)
ExtensibleRecordStores:1/1
• MotivatedbyGoogle’ssuccesswithBigTable– stilltherecentextensiblerecordstorescannotcomeclosetoBigTable’s
scalability• Basicdatamodelisrowsandcolumns• Basicscalabilitymodelissplittingbothrowsandcolumnsover
multiplenodes• Rowsaresplitacrossnodesthroughsharding ontheprimarykey
– Theytypicallysplitbyrangeratherthanahashfunction• Columnsofatablearedistributedovermultiplenodesbyusing
“columngroups”– awayforthecustomertoindicatewhichcolumnsarebeststored
together• Bothhorizontalandverticalpartitioningcanbeusedsimultaneously
onthesametable
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 46
![Page 47: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/47.jpg)
Usecase:ExtensibleRecordStore• usescasessimilartothosefordocumentstores:
– multiplekindsofobjects,withlookupsbasedonanyfield.
• However,aimedathigherthroughput,andmayprovidestrongerconcurrencyguarantees,– atthecostofslightlymorecomplexity thanthedocumentstores
• SupposestoringcustomerinformationforaneBay-styleapplication,andyouwanttopartitionyourdatabothhorizontallyandvertically:– clustercustomersbycountry,sothatyoucanefficientlysearchallof
thecustomersinonecountry– separatetherarely-changed“core”customerinformationsuchas
customeraddressesandemailaddressesinoneplace,and– putcertainfrequently-updatedcustomerinformation(suchascurrent
bidsinprogress)inadifferentplace,toimproveperformanceDukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 47
![Page 48: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/48.jpg)
ScalableRDBMS:1/1
• SomeRDBMSsareexpectedtoprovidescalabilitycomparablewithNoSQLdatastores
• But,withtwoprovisos:– Usesmall-scopeoperations: Operationsthatspanmanynodes,
e.g.joinsovermanytables,willnotscalewellwithsharding– Usesmall-scopetransactions:Likewise,transactionsthatspan
manynodesaregoingtobeveryinefficient,withthecommunicationand2PCoverhead
• TypicalNOSQLsystemsmakethesetwoimpossible• ScalableRDBMSallowsthem,butpenalizesacustomerfor
theseoperations• Havehigher-levelSQLlanguageandACIDproperties
– butpayapricewhentheyspannodes
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 48
![Page 49: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/49.jpg)
Usecase:ScalableRDBMS
• Ifyourapplicationrequiresmanytableswithdifferenttypesofdata– arelationalschemacentralizesandsimplifiesdatadefinitionandSQLsimplifiesoperations
– orforprojectswithmanyprogrammers• However,moreusefuliftheapplicationdoesnotrequire– updatesorjoinsthatspanmanynodes– transactioncoordination– or,datamovement
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 49
![Page 50: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/50.jpg)
ConsistentHashing
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 50
inDynamoDB
![Page 51: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/51.jpg)
ConsistentHashing(CH)• Recalldynamichashingschemes• Ifthe#ofslots(directorysize)changes,thenalmostallkeyshadtoberemapped
• Inconsistenthashing(CH),with#keys=Kand#slots=N,onlyK/Nkeysneedtoberemappedonaverage
• AppliestothedesignofDistributedHashTable(DHTs)forUniformLoadDistribution– partitionakeyspace amongasetofsites/nodes– additionallyprovideanoverlaynetworkthatconnectsnodessuchthatthenodesresponsibleforanykeycanbeefficientlylocated
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 51
![Page 52: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/52.jpg)
DynamoDB :CH1/2• [ref.theDynamoDB paper,sec4.3]• Mustscaleincrementally• Consistenthashingisusedtodynamicallydistribute
dataarounda“ring”ofnodes(=sites)• Theoutputofahashfunctionistreatedasacircular
ring• Eachnodeisassignedarandomvalueinthisspace
– representsthe“position”onthering
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 52
• Dataitemidentifiedbyakey• Assigntoanodebyhashing
thekeyto
![Page 53: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/53.jpg)
DynamoDB :CH2/2• Dataitemidentifiedbyakey• Assigntoanodebyhashingthekeytoyielditsposition
onthering• Walktheringclockwisetofindthefirstnodewitha
positionlargerthantheitem’sposition• Eachnodeisresponsiblefortheregioninthering
betweenitanditspredecessornodeonthering
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 53
• Note:• departureorarrivalofanodeonly
affectsitsimmediateneighbor• Theothernodesremainunaffected• K/Nonaverage!
![Page 54: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/54.jpg)
DynamoDB :ChallengesinCH• However,thisbasicCHalgorithmposessomechallenges1. Randompositionassignmentofeachnodeonthering
leadstonon-uniformdataandloaddistribution2. Thebasicalgorithmisoblivioustotheheterogeneityinthe
performanceofthenodes
• Solution:DynamousesavariantofCH
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 54
• Eachnodegetsassignedtomultiplepointsinthering
• called“virtualnode”• onenodetakescareofmultiple
virtualnodes
![Page 55: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/55.jpg)
DynamoDB:VirtualNodes• Usingvirtualnodeshasadvantages
1. Ifanodebecomesunavailable(duetofailuresorroutinemaintenance),theloadhandledbythisnodeisevenlydispersedacrosstheremainingavailablenodes
2. Whenanodebecomesavailableagain,oranewnodeisaddedtothesystem,thenewlyavailablenodeacceptsaroughlyequivalentamountofloadfromeachoftheotheravailablenodes
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 55
3. Thenumberofvirtualnodesthatanodeisresponsiblecandecidedbasedonitscapacity,accountingforheterogeneityinthephysicalinfrastructure
![Page 56: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/56.jpg)
DynamoDB:Replication• Dynamoreplicatesitsdataonmultiple(N)hostsforhigh
availabilityanddurability• Eachkeykisassignedtoacoordinatorwhichisinchargeof
replication– coordinatorhandlesallkeysinitsrange
• Coordinatorreplicateseachkeyitisinchargeof– bystoringitlocally– replicatingitattheN-1clockwisesuccesor nodesinthering
• EachnodeisinchargeofregionoftheringbetweenitanditsN-th predecessor
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 56
NodeBreplicateskeyKatnodesCandDNodeDwillstorekeysintherange(A,B],(B,C],(C,D]Note:theremaybe<N“physical”nodes
![Page 57: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/57.jpg)
CHHistory• ProposedbyCStheoreticiansfromMIT:
– Karger-Lehman-Leighton-Panigrahy-Levine-Lewin– “ConsistentHashingandRandomTrees:DistributedCachingProtocols
forRelievingHotSpotsontheWorldWideWeb”– STOC1997
• ConsistenthashinggavebirthtoAkamaiTechnologies– FoundedbyDannyLewinandTomLeightonin1998– Akamai’scontentdeliverynetworkisoneofthelargestdistributed
computingplatforms– Nowmarketcap$12Band6200employees– Managingweb-presenceofmanymajorcompanies
• 2001:TheconceptofDistributedHashTable(DHT)isproposed(howtolookforafile)andCHwasre-purposed
• NowusedinDynamo,Couchbase,Cassandra,Voldemort,Riak,..
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 57
![Page 58: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/58.jpg)
SQLvs.NOSQL
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 58
Argumentsforbothsidesstillacontroversialtopic
![Page 59: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/59.jpg)
WhychooseRDBMSoverNoSQL:1/31. Ifnewrelationalsystemscandoeverything
thataNoSQLsystemcan,withanalogousperformanceandscalability(?),andwiththeconvenienceoftransactionsandSQL,NoSQLisnotneeded
2. RelationalDBMSshavetakenandretainedmajoritymarketshareoverothercompetitorsinthepast30years– (network,object,andXMLDBMSs)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 59
![Page 60: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/60.jpg)
WhychooseRDBMSoverNoSQL:2/33. SuccessfulrelationalDBMSshavebeenbuilt
tohandleotherspecificapplicationloads inthepast:– read-onlyorread-mostlydatawarehousing– OLTPonmulti-coremulti-diskCPUs– in-memorydatabases– distributeddatabases,and– nowhorizontallyscaleddatabases
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 60
![Page 61: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/61.jpg)
WhychooseRDBMSoverNoSQL:3/3
4. Whileno“onesizefitsall”intheSQLproductsthemselves,thereisacommoninterfacewithSQL,transactions,andrelationalschemathatgiveadvantagesintraining,continuity,anddatainterchange
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 61
![Page 62: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/62.jpg)
WhychooseNoSQLoverRDBMS:1/31. Wehaven’tyetseengoodbenchmarksshowing
thatRDBMSscanachievescaling comparablewithNoSQLsystemslikeGoogle’sBigTable
2. Ifyouonlyrequirealookupofobjectsbasedonasinglekey– thenakey-valuestoreisadequateandprobablyeasiertounderstand
thanarelationalDBMS– Likewiseforadocumentstoreonasimpleapplication:youonlypay
thelearningcurveforthelevelofcomplexityyourequire
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 62
![Page 63: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/63.jpg)
WhychooseNoSQLoverRDBMS:2/3
3. Someapplicationsrequireaflexibleschema– allowingeachobjectinacollectiontohavedifferentattributes
– WhilesomeRDBMSsallowefficient“packing”oftupleswithmissingattributes,andsomeallowaddingnewattributesatruntime,thisisuncommon
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 63
![Page 64: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/64.jpg)
WhychooseNoSQLoverRDBMS:3/3
4. ArelationalDBMSmakes“expensive”(multi- nodemulti-table)operations“tooeasy”– NoSQLsystemsmakethemimpossibleorobviouslyexpensiveforprogrammers
5. WhileRDBMSshavemaintainedmajoritymarketshareovertheyears,otherproductshaveestablishedsmallerbutnon-trivialmarketsinareaswherethereisaneedforparticularcapabilities– e.g.indexedobjectswithproductslikeBerkeleyDB,orgraph-following
operationswithobject-orientedDBMSs
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 64
![Page 65: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/65.jpg)
ColumnStore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 65
![Page 66: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/66.jpg)
Rowvs.ColumnStore
• Rowstore– storeallattributesofatupletogether– storagelike“row-majororder”inamatrix
• Columnstore– storeallrowsforanattribute(column)together– storagelike“column-majororder”inamatrix
• e.g.– MonetDB,Vertica(earlier,C-store),SAP/SybaseIQ,GoogleBigtable (withcolumngroups)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 66
![Page 67: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/67.jpg)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 67
Ack:SlidefromVLDB2009tutorialonColumnstore
![Page 68: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/68.jpg)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 68
Ack:SlidefromVLDB2009tutorialonColumnstore
![Page 69: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/69.jpg)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 69
Ack:SlidefromVLDB2009tutorialonColumnstore
![Page 70: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/70.jpg)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 70
Ack:SlidefromVLDB2009tutorialonColumnstore
![Page 71: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/71.jpg)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 71
Ack:SlidefromVLDB2009tutorialonColumnstore
![Page 72: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/72.jpg)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 72
AdditionalandOptionalSlidesonMongoDB
(MaybeusefulforHW3)https://docs.mongodb.com
![Page 73: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/73.jpg)
MongoDB
• MongoDBisanopensourcedocumentstorewritteninC++• providesindexesoncollections• lockless• providesadocumentquerymechanism• supportsautomaticsharding• Replicationismostlyusedforfailover• doesnotprovidetheglobalconsistencyofatraditionalDBMS
– butyoucangetlocalconsistencyontheup-to-dateprimarycopyofadocument
• supportsdynamicquerieswithautomaticuseofindices,likeRDBMSs
• alsosupportsmap-reduce– helpscomplexaggregationsacrossdocs
• providesatomicoperationsonfields
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 73
Optionalslide:Readyourself
![Page 74: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/74.jpg)
MongoDB:AtomicOpsonFields• Theupdatecommandsupports“modifiers”thatfacilitateatomic
changestoindividualvalues– $setsetsavalue– $inc incrementsavalue– $pushappendsavaluetoanarray– $pushAll appendsseveralvaluestoanarray– $pullremovesavaluefromanarray,and$pullAll removesseveral
valuesfromanarray• Sincetheseupdatesnormallyoccur“inplace”,theyavoidthe
overheadofareturntriptotheserver• Thereisan“updateifcurrent”conventionforchangingadocument
onlyiffieldvaluesmatchagivenpreviousvalue• MongoDBsupportsafindAndModify commandtoperforman
atomicupdateandimmediatelyreturntheupdateddocument– usefulforimplementingqueuesandotherdatastructuresrequiring
atomicity
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 74
Optionalslide:Readyourself
![Page 75: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/75.jpg)
MongoDB:Index• MongoDBindicesareexplicitlydefinedusinganensureIndex call– anyexistingindicesareautomaticallyusedforqueryprocessing
• Tofindallproductsreleasedlastyear(2015)orlatercostingunder$100youcouldwrite:
• db.products.find({released:{$gte:newDate(2015,1,1,)},price{‘$lte’:100},})
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 75
Optionalslide:Readyourself
![Page 76: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/76.jpg)
MongoDB:Data
• MongoDBstoresdatainabinaryJSON-likeformatcalledBSON– BSONsupportsboolean,integer,float,date,stringandbinarytypes
–MongoDBcanalsosupportlargebinaryobjects,eg.imagesandvideos
– Thesearestoredinchunksthatcanbestreamedbacktotheclientforefficientdelivery
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 76
Optionalslide:Readyourself
![Page 77: CompSci516 Data Intensive Computing Systems · 2016-11-02 · 2. Horizontal vs. Vertical Scaling • Effective use of multiple cores (vertical scaling) is important – but the number](https://reader034.vdocument.in/reader034/viewer/2022042203/5ea4cc498d77c9559b62084e/html5/thumbnails/77.jpg)
MongoDB:Replication
• MongoDBsupportsmaster-slavereplicationwithautomaticfailoverandrecovery– Replication(andrecovery)isdoneatthelevelofshards
– Replicationisasynchronousforhigherperformance,sosomeupdatesmaybelostonacrash
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 77
Optionalslide:Readyourself