final exam logistics cs 133: databasesbeth/courses/cs133/current/... · 2019-12-11 · cs 133:...
TRANSCRIPT
CS133:Databases
Fall2019Lec25–12/10
NoSQL
Prof.BethTrushkowsky
FinalExamLogistics
• Finalexamtake-home– Available:in-classonThursday– Due:WednesdayDecember18th,5:15pm
• Sameresourcesasmidterm– Exceptthistime,twonotesheetsallowed(canre-useyourownfrommidterm)
GoalsforToday
• DiscussdatareplicationindistributedDBMSs
• Understandthemotivationandgoalsfor“NoSQL”datamanagementsystems
• Reasonaboutthekeyconcepts,techniques,andtradeoffsforNoSQLsystems– TouchonacouplespecificNoSQLsystems(Dynamo,MongoDB,Cassandra)
SomeReferencesUsed
• TenRulesforScalablePerformancein“SimpleOperation”Datastores– CommunicationsoftheACM2011– StonebrakerandCattell
• ScalableSQLandNoSQLDataStores– SIGMODRecord2011– Cattell
• Dynamo:Amazon’sHighlyAvailableKey-valueStore– SOSP2007– DeCandiaetal
• MongoDBandCassandrawebsites
AsynchronousReplication
• Themodifyingxactcancommitbeforeallcopieshavebeenchanged– Users/appsmustbeawareofwhichcopytheyarereading,andthatcopiesmaybeout-of-syncforshortperiodsoftime
• Twoapproachesforreplication:– PrimarySite
– Peer-to-Peer(akaorupdate-anywhere)– Differenceliesinhowmanycopiesare“updatable”
PrimarySiteReplication• Exactlyonecopyofarelationpartitionisdesignatedtheprimarycopy.– Replicasatothersitescannotbedirectlyupdated
• Howarechangestotheprimarycopypropagatedtothesecondarycopies?– Oneapproach:logshipping
reads/writes
readsonly
readsonlyIfreadshappenatsecondarycopies,thenpossiblethataxactisnotbeabletoreaditsownwrites
Peer-to-PeerReplication
• Morethanoneofthecopiesofanobjectcanbeprimary
• Changestoacopymustbepropagatedtoothercopies
• Iftwocopiesareupdatedinaconflictingmanner,thismustberesolved– E.g.,Lastwritewins?Combineupdatessomehow?
reads/writes
reads/writes
reads/writes
Strongvs.EventualConsistency• Strong:afterupdatetoanobject,subsequentreadsseethatupdate• Weak:subsequentreadsofanupdatemaynotreflectthatupdate
– Eventual:ifupdatesceased,eventuallythesystemwouldreflectallupdates
• Eventualconsistencyhassomevariation
– Read-your-own-writes,specialcaseofsessionorcausalconsistency– Monotonicreads
• BASE,notACID!
– BASE:Basicallyavailable,softstate,eventuallyconsistentWernerVogelsoneventualconsistency:http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
“EventuallyConsistent-Buildingreliabledistributedsystemsataworldwidescaledemandstrade-offsbetweenconsistencyandavailability.”-Vogels,CTOAmazon.com
Exercise2:ReplicaConsistency
• SupposehaveNreplicasofsomedataobject– W=#replicaswritetobeforexactcommits
– R=#replicasreadfrom
• Strongconsistency:overlaptheWandRsets– R+W>N
– E.g.,Read-one-write-all:R=1,W=N
Whatis“NoSQL”?Here’sachallenge:canyoudescribeNoSQLwithoutusinganybuzzwords?!
Web2.0Agile
Development
EventualConsistency(BASEnotACID)
Amovementaroundnon-relationaldatastores
Tonsofuser-generatedcontentontheweb• E.g.,social
networkingsites• Web1.0:few
contentcreators,morestaticwebsites
Developandchangewebapplicationsiteratively• Schemachangesand
non-uniformity• Makechangeswhile
theapplicationislive
Performancegainsattheexpenseofconsistency• Growincrementallyby
leveragingleasedcloudresourceslikeIaaS
• Automaticallygrowresourcesasneeded
Storyofa“Successful”WebStartup
• StartwitharelationalDBMSrunningonsinglemachine
• Logicinwebapplicationmanagesdirectingqueries– Cross-shardfiltersandjoinscodedinsidetheapp– Applogicdealswithdataconsistency
Websitegetspopular!!Needtoscaleup…
…somanuallypartition/sharddataacrossmorenodes
Asthenumberofmachinesincreases,thechancethatsomethingfailsincreases
TonsofNoSQLsystems
CAPTheorem• EricBrewer’sCAPtheorem:adistributedsystemcan
onlyhavetwoofthefollowingthreeproperties:– Consistency
– Availability
– TolerancetonetworkPartitions
Ofreplicateddata
Forwriterequests
Summary:NoSQLMotivation
• DevelopmentofNoSQLsystemsmotivatedbydifficultyscalingupWeb2.0applications– Thousandstomillionsofusers
– Many[small]readsandwrites(“smalloperations”)
• Typicallymakesacrificesforperformance– E.g.,noACIDxacts,eventualconsistency
AchievingScalablePerformance
• Rule#1:Shared-nothingscalability
• Rule#4:Highavailabilityandautomaticrecoveryessential
• Rule#5:On-lineeverything(systemalways“up”)
• Rule#6:Avoidmulti-nodeoperators
StonebrakerandCattell
Rule#1:Shared-nothingscalability
• Goal:asapplicationgrows,needmoreserversaddedseamlessly– Don’twantmanualmanagementofscalingup
• Exampletechniques:– Consistenthashing(DynamoandCassandra)– Periodicre-balancingofpartitions(MongoDB)
Fig2inDynamo
Rule#4:HighAvailabilityandAuto-Recovery
• Goal:updatesalwayssucceed!– Issue:conflictingwritesondisjointsetsofreplicas
Exercise3
• Example:shoppingcart– Add2itemstocart,updategoestotworeplicas
– Partition!Add1(different)itemtoeachreplica
– Bothcartsare“version2”L
• Dynamo:vectorclocks
• Orlatesttimestampwins?
“…inthecaseofatimestamptie,Cassandrafollowstworules:first,deletestakeprecedenceoverinserts/updates.Second,iftherearetwoupdates,theonewiththelexicallylargervalueisselected.”
https://wiki.apache.org/cassandra/FAQ#clocktie
Replication/AvailabilityExamples
• MongoDB:automaticfailoverforprimary
• Cassandra/Dynamo:peer-to-peerreplication– tunableconsistency,e.g.,quorumornot
Picsfromhttps://docs.mongodb.org/master/MongoDB-replication-guide-master.pdf
Rule#5:On-lineEverything(Schema)
• Recall:– Adatamodelisacollectionofhigh-leveldatadescriptionconstructs
– Aschemaisadescriptionofaparticularcollectionofdata,usingagivendatamodel
• Relationalmodelhasarigid,structuredschema– Attributesforrelationpre-defined,sharedbyalltuples– Dataandintegrityconstraints– Referentialconstraints
Whatifyouwantmoreflexibility?
NoSQL:Non-RelationalDataModels
• Agiledevelopment,liveschemachanges– Noenforcementofstructure– E.g.,every“tuple”couldhavedifferentattributes
• Inessence,thesedatamodelsarekey-based– Key:someuniqueidentifiertolookupacorresponding“value”
– Whatthevalueiscanbecomplex
Keytypicallyplaysaroleindatapartitioningscheme
Key-ValueDataModel
• Examplesystem:Amazon’sDynamo
• Keyissomeuniqueidentifier,valuecanbeanything,BLOBinterpretedbyapplogic– E.g.,idàshoppingcartcontents
• Queryfunctionality– Get(key),put(key,value)– Onlyprimarykeyindex– Noindexlookupsonnon-keys(secondaryindexes)
DocumentDataModel
• Examplesystem:MongoDB
• Storescollectionsof“documents”(e.g.,JSON)– Relation:tuple::Collection:document– KeyàDocument– Documenthaskey-valuepairs,canbenestedlistsorscalars(andnotdefinedinaglobalschema)
• Queryfunctionality– Primarykeylookups– Secondaryindexesonotherattributes
MongoDBExampleExample:infoaboutproducts,whichhavemanypartsdb.createCollection(”parts") db.createCollection(”products") // example part { _id : ObjectID('AAAA'), partno : '123-aff-456', name : '#4 grommet', qty: 94, cost: 0.94, price: 3.99 manufac_addr : [ { street: '123 Sesame St',
city: 'Anytown', cc: 'USA' }, { street: '123 Avenue Q',
city: 'New York', cc: 'USA' }]}
Modifiedfrom:http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1
One-to-manyrelationshipcapturedwithreferences
// example product{ name : ‘smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ ObjectID('AAAA'), ObjectID('F17C’), ObjectID('D2AA’)]}
WhataboutaJOIN?
ExtensibleRecord(akaColumnFamily)
• Examplesystem:BigTable,Cassandra
• Abitmorecomplexthandocumentmodel– Relation:tuple::ColumnFamily:Row– KeyàSetofcolumns(“wide-columnstore”)– Eachcolumnhaskey-valuepairs– Differentrecordscanhavedifferentcolumns
• Queryfunctionalityin“CQL”– Primarykeylookupsbyrow(withsortedcolumns)– Secondaryindexes
“Row”called“partition”now
CassandraExamples
Roughlybasedon:http://www.datastax.com/dev/blog/thrift-to-cql3
CREATE TABLE people (user_id text PRIMARY KEY,name text,addresses list
);
CREATE TABLE comments (article_id uuid,posted_at timestamp,author text,content text,PRIMARY KEY (article_id,posted_at)
);
alicious addresses name
West Alice
North
Drinkward
bobtastic addresses name
1LonelyAve Bob
a4i89 2015-12-03
…
2015-11-11
author Bob author Alice
content Bravo! content AwfulL
Firstargumentispartitionkey,othersare“clustering”columns
Becomesthepartitionkey
Rule#6:AvoidMulti-NodeQueries
• NoACIDtransactionsacrossprimarykeys
• Nojoins!Denormalizationhelps
• Systemsofferdifferentlevelsof“protection”– Key-valuestores:get(key)methodrequireskey– MongoDB:Tablescansdiscouraged– Cassandra:Tablescansprohibited
db.people.find( {favorite-color: ‘blue’}, {name: 1, addresses: 1})