l23: nosql (continued) - northeastern university · • redis-next slides and in our...
TRANSCRIPT
16
L23:NoSQL(continued)
CS3200 Databasedesign(sp18 s2)https://course.ccs.neu.edu/cs3200sp18s2/4/9/2018
17
Announcements!
• Pleasepickupyourexamifyouhavenotyet• HW6:startearly• NoSQL:focusonbigpicture- weseehowthingsfromearlierinclasscometogether
18
DatabaseReplication
• Datareplication:storingthesamedataonseveralmachines(“nodes”)• Usefulfor:- Availability (parallelrequestsaremadeagainstreplicas)- Reliability(datacansurvivehardwarefaults)- Faulttolerance(systemstaysalivewhennodes/networkfail)
• Typicalarchitecture:master-slave
Replication example in MySQL (dev.mysql.com)
19
OpenSource
• Freesoftware,sourceprovided- Usershavetherighttouse,modifyanddistributethesoftware- Butrestrictionsmaystillapply,e.g.,adaptationsneedtobeopensource
• Idea:communitydevelopment- Developersfixbugs,addfeatures,...
• Howcanthatwork?- See[Bonaccorsi,Rossi,2003.Whyopensourcesoftwarecansucceed.Researchpolicy,
32(7),pp.1243-1258]
• AmajordriverofOpenSource isApache
20
ApacheSoftwareFoundation
• Non-profitorganization• Hostscommunitiesofdevelopers- Individualsandsmall/largecompanies
• Producesopen-sourcesoftware• Fundingfromgrantsandcontributions• Hostsverysignificantprojects- ApacheWebServer,Hadoop,Zookeeper,Cassandra,Lucene,OpenOffice,Struts,Tomcat,Subversion,Tcl,UIMA,...
21
WeWillLookat4DataModels
Column-Family Store
Key/Value Store
Document Store Graph Databases
Source:BennyKimelfeld
22
Databaseenginesrankingby"popularity"
Source:https://db-engines.com/en/ranking ,4/2018
23
Databaseenginesrankingby"popularity"
Source:https://db-engines.com/en/ranking_trend ,4/2018
24
HighlightedDatabaseFeatures
• Datamodel- Whatdataisbeingstored?
• CRUD interface- APIforCreate,Read,Update,Delete
• 4basicfunctionsofpersistentstorage(insert,select,update,delete)
- SometimesprecedingSforSearch
• Transactionconsistency guarantees
• Replication andsharding model- What’sautomatedandwhat’smanual?
25
TrueandFalseConceptions
• True:- SQLdoesnoteffectivelyhandlecommonWebneedsofmassive(datacenter)data
- SQLhasguaranteesthatcansometimesbecompromisedforthesakeofscaling
- Joinsarenotforfree,sometimesundoable
• False:- NoSQL saysNOtoSQL- NowadaysNoSQL istheonlywaytogo- Joinscanalwaysbeavoidedbystructureredesign
26
• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)
• ConcludingRemarks
Outline
27
Transaction
• Asequenceofoperations(overdata)viewedasasinglehigher-leveloperation- Transfermoneyfromaccount1toaccount2
• DBMSsexecutetransactionsinparallel- Noproblemapplyingtwo“disjoint”transactions- Butwhatiftherearedependencies?
• Transactionscaneithercommit (succeed)orabort (fail)- Failureduetoviolationofprogramlogic,networkfailures,credit-cardrejection,etc.
• DBMSshouldnotexpecttransactionstosucceed
28
ExamplesofTransactions
• Airlineticketing- Verifythattheseatisvacant,withthepricequoted,thenchargecreditcard,thenreserve
• Onlinepurchasing- Similar
• “Transactionalfilesystems”(MSNTFS)- Movingafilefromonedirectorytoanother:verifyfileexists,copy,delete
• Textbookexample:bankmoneytransfer- Readfromacct#1,verifyfunds,updateacct#1,updateacct#2
29
TransferExample
Begin
Read(A,v)
v = v-100
Write(A,v)
Read(B,w)
w=w+100
Write(B,w)
Commit
Begin
Read(A,v)
v = v-100
Write(A,v)
Read(B,w)
w=w+100
Write(B,w)
Commit
Begin
Read(A,x)
x = x-100
Write(A,x)
Read(C,y)
y=y+100
Write(C,y)
Commit
txn1 txn2
• Scheduling is the operation of interleaving transactions• Why is it good?
• A serial schedule executes transactions one at a time, from beginning to end
• A good (“serializable”) scheduling is one that behaves like some serial scheduling (typically by locking protocols)
30
SchedulingExample1
Begin
Read(A,x)
x = x-100
Write(A,x)
Read(C,y)
y=y+100
Write(C,y)
Commit
txn1 txn2
Begin
Read(A,v)
v = v-100
Write(A,v)
Read(B,w)
w=w+100
Write(B,w)
Commit
Read(A,v)
v = v-100
Write(A,v)
Read(B,w)
w=w+100
Write(B,w)
Read(A,x)
x = x-100
Write(A,x)
Read(C,y)
y=y+100
Write(C,y)
31
SchedulingExample2
Begin
Read(A,x)
x = x-100
Write(A,x)
Read(C,y)
y=y+100
Write(C,y)
Commit
txn1 txn2
Begin
Read(A,v)
v = v-100
Write(A,v)
Read(B,w)
w=w+100
Write(B,w)
Commit
Read(A,v)
v = v-100
Write(A,v)
Read(B,w)
w=w+100
Write(B,w)
Read(A,x)
x = x-100
Write(A,x)
Read(C,y)
y=y+100
Write(C,y)
32
ACID
• Atomicity- Eitheralloperationsappliedornoneare(hence,weneednotworryabouttheeffectof
incomplete/failedtransactions)• Consistency- Eachtransactioncanstartwithaconsistentdatabaseandisrequiredtoleavethe
databaseconsistent(bringtheDBfromonetoanotherconsistentstate)• Isolation- Theeffectofatransactionshouldbeasifitistheonlytransactioninexecution(in
particular,changesmadebyothertransactionsarenotvisibleuntilcommitted)• Durability- Oncethesysteminformsatransactionsuccess,theeffectshouldholdwithoutregret,
evenifthedatabasecrashes(beforemakingallchangestodisk)
33
ACIDMayBeOverlyExpensive
• Inquiteafewmodernapplications:- ACIDcontrastswithkeydesiderata:highvolume,highavailability-Wecanlivewithsomeerrors,tosomeextent- Ormoreaccurately,weprefertosuffererrorsthantobesignificantlylessfunctional
• Canthispointbemademore“formal”?
34
SimpleModelofaDistributedService
• Context:distributedservice- e.g.,socialnetwork
• Clientsmakeget/setrequests- e.g.,setLike(user,post),getLikes(post)
- Eachclientcantalktoanyserver
• Serversreturnresponses- e.g.,ack,{user1,....,userk}
• Failure:thenetworkmayoccasionallydisconnectduetofailures(e.g.,switchdown)
• Desiderata:Consistency,Availability,Partitiontolerance
35
CAPServiceProperties
• Consistency:- everyread(toanynode)getsaresponsethatreflectsthemostrecentversionofthedata• Moreaccurately,atransactionshouldbehaveasifitchangestheentirestatecorrectlyinaninstant,Ideasimilartoserializability
• Availability:- everyrequest(toalivingnode)getsananswer:setsucceeds,getretunesavalue(ifyoucantalktoanodeinthecluster,itcanreadandwritedata)
• Partitiontolerance:- servicecontinuestofunctiononnetworkfailures(clustercansurvive
• Aslongasclientscanreachservers
36
SimpleIllustration
set(x,1)
set(x,1)
ok
ok
get(x)
1
CAConsistency,Availability
set(x,2)
set(x,2)
wait...
get(x) CPConsistency,Partitiontolerance
set(x,2)
set(x,2)
ok
get(x) APAvailability,Partitiontolerance
1
1
Availability
Consistency
OurRelationalDatabaseworldsofar…
Inasystemthatmaysufferpartitions,youhavetotradeoffconsistencyvs.availability
37
TheCAPTheorem
EricBrewer’sCAPTheorem:
Adistributedservicecansupportatmosttwo
outofC,A andP
38
HistoricalNote
• Brewerpresenteditasthe CAPprinciple ina1999article- ThenasaninformalconjectureinhiskeynoteatthePODC2000conference
• In2002aformalproofwasgivenbyGilbertandLynch,makingCAPatheorem- [SethGilbert,NancyA.Lynch:Brewer'sconjectureandthefeasibilityofconsistent,available,partition-
tolerantwebservices.SIGACTNews33(2):51-59(2002)]
- Itismainlyaboutmakingthestatementformal;theproofisstraightforward
39Source:http://blog.nahurst.com/visual-guide-to-nosql-systems ,2010
40
CAPtheorem
Source:http://guide.couchdb.org
41
TheBASEModel
• AppliestodistributedsystemsoftypeAP• BasicAvailability- Providehighavailabilitythroughdistribution:Therewillbearesponsetoanyrequest.
Responsecouldbea‘failure’toobtaintherequesteddata,orthedatamaybeinaninconsistentorchangingstate.
• Softstate- Inconsistency(staleanswers)allowed:Stateofthesystemcanchangeovertime,so
evenduringtimeswithoutinput,changescanhappendueto‘eventualconsistency’
• Eventualconsistency- Ifupdatesstop,thenaftersometimeconsistencywillbeachieved
• Achievedbyprotocolstopropagateupdatesandverifycorrectnessofpropagation(gossipprotocols)
• Philosophy:besteffort,optimistic,stalenessandapproximationallowed
42
• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)
• ConcludingRemarks
Outline
43
Key-ValueStores
• Essentially,bigdistributedhashmaps• OriginattributedtoDynamo– Amazon’sDBforworld-scalecatalog/cartcollections- ButBerkeleyDBhasbeenherefor>20years
• Storepairs⟨key,opaque-value⟩- OpaquemeansthatDBdoesnotassociateanystructure/semanticswiththevalue;oblivioustovalues
- Thismaymeanmoreworkfortheuser:retrievingalargevalueandparsingtoextractanitemofinterest
• Sharding viapartitioningofthekeyspace- Hashing,gossipandremappingprotocolsforloadbalancingandfaulttolerance
44
Hashing(Hashtables,dictionaries)
0
m–1
h(k1)
h(k4)
h(k2)
h(k3)
U(universe of keys)
K(actualkeys)
k1
k2
k3
k5
k4
h(k2)
n = |K| << |U|.
key k “hashes” to slot T[h[k]]
hash table T [0…m–1]h : U ® {0,1,…, m–1}
45
Hashing(Hashtables,dictionaries)
h(k1)
h(k4)
h(k2)
h(k3)
k1
k2
k3
k5
k4
collisionh(k2)=h(k5)
0
m–1
U(universe of keys)
K(actualkeys)
hash table T [0…m–1]
46
ExampleDatabases
• Amazon’sDynamoDB- OriginallydesignedforAmazon’sworkloadatpeaks- OfferedaspartofAmazon’sWebservices
• Redis- NextslidesandinourJupyter notebooks
• Riak- Focusesonhighavailability,BASE- “AslongasyourRiak clientcanreachoneRiak server,itshouldbeabletowritedata.”
• FoundationDB- Focusontransactions,ACID
• BerkeleyDB(andOracleNoSQLDatabase)- Firstrelease1994,byBerkeley,acquiredbyOracle- ACID,replication
47
Redis
• Basicallyadatastructureforstrings,numbers,hashes,lists,sets• Simplistic“transaction”management- Queuingofcommandsasblocks,really- AmongACID,onlyIsolationguaranteed
• Ablockofcommandsthatisexecutedsequentially;notransactioninterleaving;norollbackonerrors
• In-memory store- Persistencebyperiodicalsavestodisk
• Comeswith- Acommand-lineAPI- Clientsfordifferentprogramminglanguages
• Perl,PHP,Rubi,Tcl,C,C++,C#,Java,R,…
48
key value
set x 10 x 10
hset h y 5 h yà5
hset h1 name twohset h1 value 2_ h1
nameàtwovalueà2
hmset p:22 name Alice age 25 p:22 nameàAliceageà25
sadd s 20___sadd s Alicesadd s Alice s {20,Alice}
rpush l arpush l blpush l c l (c,a,b)
(simple value)
(hash table)
(set)
(list)
key maps to:
ExampleofRedis Commands
get x>> 10
hget h y>> 5
hkeys p:22>> name , age
smembers s>> 20 , Alice
scard s>> 2
llen l>> 3
lrange l 1 2 >> a , b
lindex l 2>> b
lpop l >> c
rpop l >> b
49
key value
set x 10 x 10
hset h y 5 h yà5
hset h1 name twohset h1 value 2_ h1
nameàtwovalueà2
hmset p:22 name Alice age 25 p:22 nameàAliceageà25
sadd s 20___sadd s Alicesadd s Alice s {20,Alice}
rpush l arpush l blpush l c l (c,a,b)
(simple value)
(set)
(list)
key maps to:
ExampleofRedis Commands
get x>> 10
hget h y>> 5
hkeys p:22>> name , age
smembers s>> 20 , Alice
scard s>> 2
llen l>> 3
lrange l 1 2 >> a , b
lindex l 2>> b
lpop l >> c
rpop l >> b
(hash table)
50
AdditionalNotes
• Akeycanbeany<256MBbinarystring- Forexample,JPEGimage
• Somekeyoperations:- Listallkeys:keys *- Removeallkeys:flushall- Checkifakeyexists:exists k
• Youcanconfigurethepersistencymodel- save m k meanssaveeverym secondsifatleastk keyshavechanged
51
Redis Cluster
• Add-onmoduleformanagingmulti-nodeapplicationsoverRedis• Master-slavearchitectureforsharding +replication- Multiplemastersholdingpairwisedisjointsetsofkeys,everymasterhasasetofslavesforreplicationandshardingMaster and Slave nodesNodes are all connected and functionally equivalent, but actually there are two kind of nodes: slave and master nodes:
RedundancyIn the example there are two replicas per every master node, so up to two random nodes can go down without issues.
Working with two nodes down is guaranteed, but in the best case the cluster will continue to work as long as there is at least one node for every hash slot.
Source:http://redis.io/presentation/Redis_Cluster.pdf
Blue… master,Yellow… replicasUpto2randomnodescangodownwithoutissues becauseofredudancy
52
Whentouseit
• Useit:- Allaccesstothedatabasesisviaprimarykey- Storingsessioninformation(websession)- userorproductprofiles(singleGEToperation)- shoppingcardinformation(basedonuserid)
• Don'tuseit:- relationshipsbetweendifferentsetsofdata- querybydata(basedonvalues)- operationsonmultiplekeysatatime
53
• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)
• ConcludingRemarks
Outline
54
keyspace
2TypesofColumnStores
sid name address year faculty861 Alice Haifa 2 NULL753 Amir London NULL CS955 Ahuva NULL 2 IE
StandardRDB
id sid1 8612 7533 955
id name1 Alice2 Amir3 Ahuva
id address1 Haifa2 London
id year1 23 2
id faculty2 CS3 IE
Eachcolumnstoredseparately.Why?Efficiency (fetchonlyrequiredcolumns),
compression,sparse dataforfree
1 sid:861 name:Aliceaddress:Haifa ts:20
2 sid:753 name:Amiraddress:London ts:22
3 sid:955name:Ahuva ts:32
1 year:2 ts:26
2 faculty:CS ts:25email:{prime:c@d ext:c@e}
3 year:2 faculty:IE ts:32email:{prime:a@b ext:a@c}
columnfamily
columnfamily
“column”
“supercolumn”
Column-FamilyStore (NoSQL)
timestampforconflicts
Columnstore (stillSQL)
Cassandradatamodel
55
ColumnStores
• Thetwooftenmixedas“columnstore”à confusion- SeeDanielAbadi’sblog:http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html
• Commonidea:don’tkeeparowinaconsecutiveblock,splitviaprojection- Columnstore:eachcolumnisindependent- Column-familystore:eachcolumnfamilyisindependent
• Bothprovidesomemajorefficiencybenefitsincommonread-mainlyworkloads- Givenaquery,loadtomemoryonlytherelevantcolumns- Columnscanoftenbehighlycompressedduetovaluesimilarity- Effectiveformforsparseinformation(noNULLs,nospace)
• Column-family storeishandleddifferentlyfromRDBs,oftenrequiringadesignatedquerylanguage
56
ExamplesSystems
• Columnstore(SQL):- MonetDB (started2002,Univ.Amsterdam)- VectorWise (spawnedfromMonetDB)- Vertica (M.Stonebraker)- SAP SybaseIQ- Infobright
• Column-familystore (NoSQL):- Google’sBigTable (maininspirationtocolumnfamilies)- ApacheHBase (usedbyFacebook,LinkedIn,Netflix...,CPinCAP)- Hypertable- ApacheCassandra (APinCAP)
57
Example:ApacheCassandra
• InitiallydevelopedbyFacebook- Open-sourcedin2008
• Usedby1500+businesses,e.g.,Comcast,eBay,GitHub,Hulu,Instagram,Netflix,BestBuy,...
• Column-familystore- Supportskey-valueinterface- ProvidesaSQL-likeCRUDinterface:CQL
• UsesBloomfilters- Aninterestingmembershiptestthatcanhavefalsepositivesbutneverfalsenegatives,
behaveswellstatistically
• BASEconsistencymodel(APinCAP)- Gossipprotocol(constantcommunication)toestablishconsistency- Ring-basedreplicationmodel
58
ExampleBloomFilterk=3
Insert(x,H)
Member(y,H)
y1 = is not in H (why ?)
y2 may be in H (why ?)
59
3
2
4
88
3
2
4
Cassandra’sRingModel
1
5
7
6
write(k,t) hash(k)=2
write(k,t)
ReplicationFactor=3
Advantage:Flexibility/easeofclusterredesign
Coordinatornode
Primaryresponsible
Additionalreplicas
Seemore:https://www.hakkalabs.co/articles/how-cassandra-stores-data
60
Whentouseit(e.g.Cassandra)
• Useit:- Eventlogging(multipleapplicationscanwriteindifferentcolumnsandrow-key:appname:timestamp)
- CMS:Storeblogentrieswithtags,categories,linksindifferentcolumns- Counters:e.g.visitorsofapage
• Don'tuseit:- ifyourequireACID,consistency- ifyouhavetoaggregatesacrossalltherows- ifyouchangequerypatternsoften(inRDMS schemachangesarecostly,inCassandrda querychangesare)
61
• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)
• ConcludingRemarks
Outline
62
DocumentStores
• Similarinnaturetokey-valuestore,butvalueistreestructured asadocument
• Motivation:avoidjoins;ideally,allrelevantjoinsalreadyencapsulatedinthedocumentstructure
• Adocumentisanatomicobjectthatcannotbesplitacrossservers- Butadocumentcollectionwillbesplit
• Moreover,transactionatomicity istypicallyguaranteedwithinasingledocument
• Modelgeneralizescolumn-familyandkey-valuestores
63
ExampleDatabases
• MongoDB- Nextslides
• ApacheCouchDB- EmphasizesWebaccess
• RethinkDB- Optimizedforhighlydynamicapplicationdata
• RavenDB- Deignedfor.NET,ACID
• Clusterpoint Server- XMLandJSON,acombinedSQL/JavaScriptQL
64
MongoDB
• Opensource,1strelease2009,documentstore- Actually,anextendedformatcalledBSON (BinaryJSON =JavaScriptObjectNotation)for
typingandbettercompression
• Supportsreplication (master/slave),sharding (horizontalpartitioning)- Developerprovidesthe"shardkey"– collectionispartitionedbyrangesofvaluesofthis
key
• Consistency guarantees,CPofCAP• UsedbyAdobe(experiencetracking),Craigslist,eBay,FIFA(videogame),LinkedIn,McAfee
• ProvidesconnectortoHadoop- ClouderaprovidestheMongoDBconnectorindistributions
65
DataExample:High-level
{name:"Alice",age:21,status:"A",groups:["algorithms","theory"]
}
Document
Source:Modifiedfromhttps://docs.mongodb.com/v3.0/core/crud-introduction/
Collection{
name:"Alice",age:21,status:"A",groups:["algorithms","theory"]
}
{name:"Bob",age:18,status:"B",groups:["database","cooking"]
}
{name:"Charly",age:22,status:"A",groups:["database","cars"]
}
{name:"Dorothee",age:16,status:"A",groups:["cars","sports"]
}
66
MongoDBTerminology
RDBMS• Database• Table• Record/Row/Tuple• Column• Primarykey• Foreignkey
MongoDB• Database• Collection• Document• Field• _id
67
MongoDB DataModel
• JavaScriptObjectNotation(JSON)model• Database =setofnamedcollections• Collection =sequenceofdocuments• Document ={attribute1:value1,...,attributek:valuek}• Attribute=string(attributei≠attributej)• Value =primitive value(string,number,date,...),oradocument,oranarray- Array =[value1,...,valuen]
• Keyproperties:hierarchical(likeXML),noschema- Collectiondocsmayhavedifferentattributes
generalizes relation
generalizes tuple
68
DataExample
{item:"ABC2",details:{model:"14Q3",manufacturer:"M1Corporation"},stock:[{size:"M",qty:50}],category:"clothing”
}
{item:"MNO2",details:{model:"14Q3",manufacturer:"ABCCompany"},stock:[{size:"S",qty:5},{size:"M",qty:5},{size:"L",qty:1}],category:"clothing”
}
Collectioninventory
db.inventory.insert({item:"ABC1",details:{model:"14Q3",manufacturer:"XYZCompany"},stock:[{size:"S",qty:25},{size:"M",qty:50}],category:"clothing"}
) Document insertionSource:Modifiedfromhttps://docs.mongodb.com/v3.0/core/crud-introduction/
69
ExampleofaSimpleQuery
{_id:"a",cust_id:"abc123",status:"A",price:25,items:[{sku:"mmm",qty:5,price:3},
{sku:"nnn",qty:5,price:2}]}{
_id:"b",cust_id:"abc124",status:"B",price:12,items:[{sku:"nnn",qty:2,price:2},
{sku:"ppp",qty:2,price:4}]}
Collectionordersdb.orders.find({status:"A"},{cust_id:1,price:1,_id:0}
)
InSQLitwouldlooklikethis:SELECTcust_id,priceFROMordersWHEREstatus="A"
{cust_id:"abc123",price:25
}
selection
projection
Findallordersandpricewithwithstatus"A"
70
Whentouseit
• Useit:- Eventlogging:differenttypesofeventsacrossanenterprise- CMS:usercomments,registration,profiles,web-facingdocuments- E-commerce:flexibleschemaforproducts,evolvedatamodels
• Don'tuseit:- ifyourequireatomiccross-documentoperations- queriesagainstvaryingaggregatestructures
71
• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)
• ConcludingRemarks
Outline