final exam logistics cs 133: databasesbeth/courses/cs133/current/... · 2019-12-11 · cs 133:...

7
CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics Final exam take-home Available: in-class on Thursday Due: Wednesday December 18 th , 5:15pm Same resources as midterm Except this time, two note sheets allowed (can re- use your own from midterm) Goals for Today Discuss data replication in distributed DBMSs Understand the motivation and goals for “NoSQL” data management systems Reason about the key concepts, techniques, and tradeoffs for NoSQL systems Touch on a couple specific NoSQL systems (Dynamo, MongoDB, Cassandra) Some References Used Ten Rules for Scalable Performance in “Simple Operation” Datastores Communications of the ACM 2011 Stonebraker and Cattell Scalable SQL and NoSQL Data Stores SIGMOD Record 2011 Cattell Dynamo: Amazon’s Highly Available Key-value Store SOSP 2007 DeCandia et al MongoDB and Cassandra web sites

Upload: others

Post on 09-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

CS133:Databases

Fall2019Lec25–12/10

NoSQL

Prof.BethTrushkowsky

FinalExamLogistics

•  Finalexamtake-home– Available:in-classonThursday– Due:WednesdayDecember18th,5:15pm

•  Sameresourcesasmidterm– Exceptthistime,twonotesheetsallowed(canre-useyourownfrommidterm)

GoalsforToday

•  DiscussdatareplicationindistributedDBMSs

•  Understandthemotivationandgoalsfor“NoSQL”datamanagementsystems

•  Reasonaboutthekeyconcepts,techniques,andtradeoffsforNoSQLsystems– TouchonacouplespecificNoSQLsystems(Dynamo,MongoDB,Cassandra)

SomeReferencesUsed

•  TenRulesforScalablePerformancein“SimpleOperation”Datastores–  CommunicationsoftheACM2011–  StonebrakerandCattell

•  ScalableSQLandNoSQLDataStores–  SIGMODRecord2011–  Cattell

•  Dynamo:Amazon’sHighlyAvailableKey-valueStore–  SOSP2007–  DeCandiaetal

•  MongoDBandCassandrawebsites

Page 2: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

AsynchronousReplication

•  Themodifyingxactcancommitbeforeallcopieshavebeenchanged– Users/appsmustbeawareofwhichcopytheyarereading,andthatcopiesmaybeout-of-syncforshortperiodsoftime

•  Twoapproachesforreplication:–  PrimarySite

–  Peer-to-Peer(akaorupdate-anywhere)– Differenceliesinhowmanycopiesare“updatable”

PrimarySiteReplication•  Exactlyonecopyofarelationpartitionisdesignatedtheprimarycopy.–  Replicasatothersitescannotbedirectlyupdated

•  Howarechangestotheprimarycopypropagatedtothesecondarycopies?– Oneapproach:logshipping

reads/writes

readsonly

readsonlyIfreadshappenatsecondarycopies,thenpossiblethataxactisnotbeabletoreaditsownwrites

Peer-to-PeerReplication

•  Morethanoneofthecopiesofanobjectcanbeprimary

•  Changestoacopymustbepropagatedtoothercopies

•  Iftwocopiesareupdatedinaconflictingmanner,thismustberesolved–  E.g.,Lastwritewins?Combineupdatessomehow?

reads/writes

reads/writes

reads/writes

Strongvs.EventualConsistency•  Strong:afterupdatetoanobject,subsequentreadsseethatupdate•  Weak:subsequentreadsofanupdatemaynotreflectthatupdate

–  Eventual:ifupdatesceased,eventuallythesystemwouldreflectallupdates

•  Eventualconsistencyhassomevariation

–  Read-your-own-writes,specialcaseofsessionorcausalconsistency–  Monotonicreads

•  BASE,notACID!

–  BASE:Basicallyavailable,softstate,eventuallyconsistentWernerVogelsoneventualconsistency:http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

“EventuallyConsistent-Buildingreliabledistributedsystemsataworldwidescaledemandstrade-offsbetweenconsistencyandavailability.”-Vogels,CTOAmazon.com

Page 3: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

Exercise2:ReplicaConsistency

•  SupposehaveNreplicasofsomedataobject– W=#replicaswritetobeforexactcommits

– R=#replicasreadfrom

•  Strongconsistency:overlaptheWandRsets– R+W>N

– E.g.,Read-one-write-all:R=1,W=N

Whatis“NoSQL”?Here’sachallenge:canyoudescribeNoSQLwithoutusinganybuzzwords?!

Web2.0Agile

Development

EventualConsistency(BASEnotACID)

Amovementaroundnon-relationaldatastores

Tonsofuser-generatedcontentontheweb•  E.g.,social

networkingsites•  Web1.0:few

contentcreators,morestaticwebsites

Developandchangewebapplicationsiteratively•  Schemachangesand

non-uniformity•  Makechangeswhile

theapplicationislive

Performancegainsattheexpenseofconsistency•  Growincrementallyby

leveragingleasedcloudresourceslikeIaaS

•  Automaticallygrowresourcesasneeded

Storyofa“Successful”WebStartup

•  StartwitharelationalDBMSrunningonsinglemachine

•  Logicinwebapplicationmanagesdirectingqueries–  Cross-shardfiltersandjoinscodedinsidetheapp–  Applogicdealswithdataconsistency

Websitegetspopular!!Needtoscaleup…

…somanuallypartition/sharddataacrossmorenodes

Asthenumberofmachinesincreases,thechancethatsomethingfailsincreases

TonsofNoSQLsystems

Page 4: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

CAPTheorem•  EricBrewer’sCAPtheorem:adistributedsystemcan

onlyhavetwoofthefollowingthreeproperties:–  Consistency

–  Availability

–  TolerancetonetworkPartitions

Ofreplicateddata

Forwriterequests

Summary:NoSQLMotivation

•  DevelopmentofNoSQLsystemsmotivatedbydifficultyscalingupWeb2.0applications–  Thousandstomillionsofusers

– Many[small]readsandwrites(“smalloperations”)

•  Typicallymakesacrificesforperformance–  E.g.,noACIDxacts,eventualconsistency

AchievingScalablePerformance

•  Rule#1:Shared-nothingscalability

•  Rule#4:Highavailabilityandautomaticrecoveryessential

•  Rule#5:On-lineeverything(systemalways“up”)

•  Rule#6:Avoidmulti-nodeoperators

StonebrakerandCattell

Rule#1:Shared-nothingscalability

•  Goal:asapplicationgrows,needmoreserversaddedseamlessly–  Don’twantmanualmanagementofscalingup

•  Exampletechniques:–  Consistenthashing(DynamoandCassandra)–  Periodicre-balancingofpartitions(MongoDB)

Fig2inDynamo

Page 5: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

Rule#4:HighAvailabilityandAuto-Recovery

•  Goal:updatesalwayssucceed!–  Issue:conflictingwritesondisjointsetsofreplicas

Exercise3

•  Example:shoppingcart– Add2itemstocart,updategoestotworeplicas

– Partition!Add1(different)itemtoeachreplica

– Bothcartsare“version2”L

•  Dynamo:vectorclocks

•  Orlatesttimestampwins?

“…inthecaseofatimestamptie,Cassandrafollowstworules:first,deletestakeprecedenceoverinserts/updates.Second,iftherearetwoupdates,theonewiththelexicallylargervalueisselected.”

https://wiki.apache.org/cassandra/FAQ#clocktie

Replication/AvailabilityExamples

•  MongoDB:automaticfailoverforprimary

•  Cassandra/Dynamo:peer-to-peerreplication–  tunableconsistency,e.g.,quorumornot

Picsfromhttps://docs.mongodb.org/master/MongoDB-replication-guide-master.pdf

Rule#5:On-lineEverything(Schema)

•  Recall:–  Adatamodelisacollectionofhigh-leveldatadescriptionconstructs

–  Aschemaisadescriptionofaparticularcollectionofdata,usingagivendatamodel

•  Relationalmodelhasarigid,structuredschema–  Attributesforrelationpre-defined,sharedbyalltuples–  Dataandintegrityconstraints–  Referentialconstraints

Whatifyouwantmoreflexibility?

Page 6: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

NoSQL:Non-RelationalDataModels

•  Agiledevelopment,liveschemachanges– Noenforcementofstructure– E.g.,every“tuple”couldhavedifferentattributes

•  Inessence,thesedatamodelsarekey-based– Key:someuniqueidentifiertolookupacorresponding“value”

– Whatthevalueiscanbecomplex

Keytypicallyplaysaroleindatapartitioningscheme

Key-ValueDataModel

•  Examplesystem:Amazon’sDynamo

•  Keyissomeuniqueidentifier,valuecanbeanything,BLOBinterpretedbyapplogic–  E.g.,idàshoppingcartcontents

•  Queryfunctionality– Get(key),put(key,value)– Onlyprimarykeyindex– Noindexlookupsonnon-keys(secondaryindexes)

DocumentDataModel

•  Examplesystem:MongoDB

•  Storescollectionsof“documents”(e.g.,JSON)–  Relation:tuple::Collection:document–  KeyàDocument– Documenthaskey-valuepairs,canbenestedlistsorscalars(andnotdefinedinaglobalschema)

•  Queryfunctionality–  Primarykeylookups–  Secondaryindexesonotherattributes

MongoDBExampleExample:infoaboutproducts,whichhavemanypartsdb.createCollection(”parts") db.createCollection(”products") // example part { _id : ObjectID('AAAA'), partno : '123-aff-456', name : '#4 grommet', qty: 94, cost: 0.94, price: 3.99 manufac_addr : [ { street: '123 Sesame St',

city: 'Anytown', cc: 'USA' }, { street: '123 Avenue Q',

city: 'New York', cc: 'USA' }]}

Modifiedfrom:http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1

One-to-manyrelationshipcapturedwithreferences

// example product{ name : ‘smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ ObjectID('AAAA'), ObjectID('F17C’), ObjectID('D2AA’)]}

WhataboutaJOIN?

Page 7: Final Exam Logistics CS 133: Databasesbeth/courses/cs133/current/... · 2019-12-11 · CS 133: Databases Fall 2019 Lec 25 – 12/10 NoSQL Prof. Beth Trushkowsky Final Exam Logistics

ExtensibleRecord(akaColumnFamily)

•  Examplesystem:BigTable,Cassandra

•  Abitmorecomplexthandocumentmodel–  Relation:tuple::ColumnFamily:Row–  KeyàSetofcolumns(“wide-columnstore”)–  Eachcolumnhaskey-valuepairs– Differentrecordscanhavedifferentcolumns

•  Queryfunctionalityin“CQL”–  Primarykeylookupsbyrow(withsortedcolumns)–  Secondaryindexes

“Row”called“partition”now

CassandraExamples

Roughlybasedon:http://www.datastax.com/dev/blog/thrift-to-cql3

CREATE TABLE people (user_id text PRIMARY KEY,name text,addresses list

);

CREATE TABLE comments (article_id uuid,posted_at timestamp,author text,content text,PRIMARY KEY (article_id,posted_at)

);

alicious addresses name

West Alice

North

Drinkward

bobtastic addresses name

1LonelyAve Bob

a4i89 2015-12-03

2015-11-11

author Bob author Alice

content Bravo! content AwfulL

Firstargumentispartitionkey,othersare“clustering”columns

Becomesthepartitionkey

Rule#6:AvoidMulti-NodeQueries

•  NoACIDtransactionsacrossprimarykeys

•  Nojoins!Denormalizationhelps

•  Systemsofferdifferentlevelsof“protection”– Key-valuestores:get(key)methodrequireskey– MongoDB:Tablescansdiscouraged– Cassandra:Tablescansprohibited

db.people.find( {favorite-color: ‘blue’}, {name: 1, addresses: 1})