nosql data stores - roma tre universitytorlone/bd2/nosql-2.pdf · nosql systems > timeline 2003...
TRANSCRIPT
![Page 2: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/2.jpg)
NoSQL Systems > Timeline
2003 Memcached2006 Google BigTable2007 Amazon Dynamo
2007 HBase2008 Cassandra, CouchDB2009 P.Voldemort, Redis, Riak, MongoDB
30/05/2011 Sistemi NoSQL 2
![Page 3: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/3.jpg)
NoSQL Systems > Memcached
• What ismemcached– Caching system intended to alleviate database load.– In‐memory key‐value store for small chunks of data.
• Extremely successful– Facebook, Yahoo, Wikipedia, Ebay, Digg, ….
30/05/2011 Sistemi NoSQL 3
![Page 4: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/4.jpg)
Memcached > How does it work
30/05/2011 Sistemi NoSQL 4
Super simple!
v = memcachedClient.get(key);if(v == NULL) {
v = db.query( SOME SLOW QUERY );memcachedClient.set(key, v);
}
Key‐value cache1. Keys are hashed2. Hash table span across an
arbitrary number of servers
![Page 5: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/5.jpg)
NoSQL Systems > Google BigTable
30/05/2011 Sistemi NoSQL 5
![Page 6: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/6.jpg)
NoSQL Systems > Google BigTable
30/05/2011 Sistemi NoSQL 6
• BigTable is a distributed storagesystem for managing structureddata that is designed to scale to a very large size.
• Petabytes of data across thousands of commodity servers.
• Built on top of Google File System
![Page 7: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/7.jpg)
Google BigTable > Data Model
30/05/2011 Sistemi NoSQL 7
…
Row Id ColumnFamily1 ColumnFamily2 … ColumnFamilyN
rowid1 qualifier1 = “abc”qualifier2 = “def”qualifier3 = “123”…
qualifier1 = “xyz”qualifier5 = “fgh”
… …
rowid2…
• Column Families are (the only things) defined in the schema• Qualifiers are added dynamically.
• Simple queries• Get a row by key• Get a range of rows by (start key, end key)
![Page 8: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/8.jpg)
Google BigTable > Data Model > Example
30/05/2011 Sistemi NoSQL 8
• Student – Course– 1 student > many courses– 1 course > many students
Studentsid PKnameemailbirthdate
Courseid PKtitledescriptionteacher_id
Student2Coursestudent_idcourse_id
![Page 9: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/9.jpg)
Google BigTable > Data Model > Example
30/05/2011 Sistemi NoSQL 9
De‐normalized data
Single key‐space
![Page 10: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/10.jpg)
Google BigTable > Infrastructure
• Partition model: sharding on the row key :– Data is divided into tablets– Each tablet is defined by the range of row keys it isresponsible for (start key – end key)
– Each tablet is served by one tablet server at a time– Each tablet server may serve (has the lock for) manytablets.
• Distributed locking service called Chubby– Manages tablet servers lifecycle
30/05/2011 Sistemi NoSQL 10
![Page 11: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/11.jpg)
Google BigTable > Infrastructure
• Three‐level hierarchy to store tablet location– Analogous to a B+ Tree
30/05/2011 Sistemi NoSQL 11
Master ServerTablet Servers
![Page 12: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/12.jpg)
Google BigTable > Infrastructure
30/05/2011 Sistemi NoSQL 12
Client Master Server
Tablet Server
Tablet Server
Tablet Server
Tablet Serverrequest
request
response
• Strong consistency– Only one tablet server is responsible for a given piece of data.– Replication is handled on the GFS layer
• Trade‐off with availability– If a tablet server fails, its portion of data is temporarily unavailable until a new
server is assigned
![Page 13: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/13.jpg)
NoSQL Systems > Amazon Dynamo
“An extra tenth of second in response times will cost us1% in sales” ‐ Amazon
• Dynamo: Highly available key‐value store
• Challenge: reliability at massive scale– Tens of millions of customers.– Tens of thousands of servers.
30/05/2011 Sistemi NoSQL 13
![Page 14: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/14.jpg)
Amazon Dynamo > Data Model
• Binary objects (i.e. blobs) identified by uniquekeys
• Query model: – Simple read and write operations to data retrievedby primary key
– No operations span multiple data items
30/05/2011 Sistemi NoSQL 14
![Page 15: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/15.jpg)
Amazon Dynamo > Infrastructure
• Partitioning similar to P2P (Chord, Pastry, etc.)– Keys are hashed.– The range of the hash function is treated as a circular space (ring).
– Each node is responsible for a region of the ring.– Distributed Hash Table (DHT)
30/05/2011 Sistemi NoSQL 15
![Page 16: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/16.jpg)
AA
N=1N=1
N=2N=2
N=2N=2
N=3N=3
NoSQL Systems > Amazon Dynamo
30/05/2011 Sistemi NoSQL 16
“AE107FB…”
• Each node is responsiblefor the region between itand its N predecessors.
• N is tuned on per‐nodebasis
![Page 17: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/17.jpg)
NoSQL Systems > Amazon Dynamo
• Replication– Each data item is replicated at many hosts
• Eventual consistency– Updates are propagated to replicas asynchronously– The system eventually reaches a consistent state
• Tradeoff between consistency and availability– Number of replicas is crucial
30/05/2011 Sistemi NoSQL 17
![Page 18: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/18.jpg)
Case Study > Facebook Messages
30/05/2011 Sistemi NoSQL 18
![Page 19: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/19.jpg)
Case Study > Facebook Messages
• Real‐time, reliable messaging system that combines chat, messagesand emails.
• 135+ billion messages per month
• Two main usage patterns– A short set of temporal data that tends to be volatile– An ever growing set of data that rarely gets accessed
• Candidate systems: – MySQL– Apache Cassandra– Apache HBase
30/05/2011 Sistemi NoSQL 19
![Page 20: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/20.jpg)
Facebook Messages > MySQL
• Attractive choice:+ Facebook core infrastructure is MySQL‐based
• It is indeed a giant LAMP application+ Facebook team has extensive knowledge in running and managing MySQL
• But…– MySQL clustering is hard to mantain (and scale)– MySQL performances suffer with large indexes and data sets
30/05/2011 Sistemi NoSQL 20
![Page 21: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/21.jpg)
Facebook Messages > Apache HBase
• BigTable’s open‐source clone– Extensible record store– Strong consistency
• Availability trade‐off
• Part of the Hadoop ecosystem– Built on top of HDFS– Integrates with Hive, ZooKeeper, etc.
30/05/2011 Sistemi NoSQL 21
![Page 22: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/22.jpg)
Facebook Messages > Apache Cassandra
• Marriage between BigTable and Dynamo– Data model: Extensible record store (BigTable)– Infrastructure: Distributed Hash Table (Dynamo)
• Eventual consistency• High availability
• Developed by Facebook itself– To serve the (old) inbox system
30/05/2011 Sistemi NoSQL 22
![Page 23: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/23.jpg)
Facebook Messages > Evaluation results
• MySQL soon discarded• Hbase vs Cassandra
30/05/2011 Sistemi NoSQL 23
Data model Consistency model Availability
HBase Extensible record store
Strong consistency ‐ Replicas managed by HDFS
‐ Region servers are singlepoints of failure
Cassandra Extensible record store
Eventual consistency ‐ No single point of failure
![Page 24: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/24.jpg)
Facebook Messages > Evaluation results
• MySQL soon discarded• Hbase vs Cassandra
30/05/2011 Sistemi NoSQL 24
Data model Consistency model Availability
HBase Extensible record store
Strong consistency ‐ Replicas managed by HDFS
‐ Region servers are singlepoints of failure
Cassandra Extensible record store
Eventual consistency ‐ No single point of failure
• Hbase won– Strong consistency is a better match for real‐time systems
![Page 25: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/25.jpg)
NoSQL Systems > Overview
• We have seen:– Extensible record stores
• BigTable, HBase, Cassandra
– Key‐value stores• Dynamo
• There’s more to it!– Document stores
30/05/2011 Sistemi NoSQL 25
![Page 26: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/26.jpg)
NoSQL Systems > Document stores
• Systems that store collections of documents
• What is a document?– Generally, an object with a number of fields, whosevalues can be scalars, lists, or nested documents aswell
• e.g.: XML, JSON
30/05/2011 Sistemi NoSQL 26
![Page 27: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/27.jpg)
Case Study > Guardian.co.uk
30/05/2011 Sistemi NoSQL 27
![Page 28: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/28.jpg)
Guardian.co.uk > 2005‐09
Modern Java application– Strong model in Java– Oracle RDBMS– Database abstractedwith ORM
30/05/2011 Sistemi NoSQL 28
Problems: increasing complexity– Complex Hibernate binding (10.000+ lines of XML config)– Lots of optimisations– Complex caching strategy– Load becoming an issue– …
![Page 29: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/29.jpg)
Guardian.co.uk > 2009‐10
30/05/2011 Sistemi NoSQL 29
• Introduce yet more caching
Memcached
• Decouple applications from db by building APIs– Power APIs using scalable technologies (Apache Solr)– JSON results
DB Load
![Page 30: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/30.jpg)
Guardian.co.uk > 2009‐10
30/05/2011 Sistemi NoSQL 30
Three models now:– RDBMS Tables– Java objects– JSON API
JSON model is very simple:– Multiple domain objects expressed in a single doc– Can be designed in a forwardly extensible way
headache
![Page 31: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/31.jpg)
Guardian.co.uk > 2009‐10
30/05/2011 Sistemi NoSQL 31
Article
Tags
![Page 32: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/32.jpg)
Guardian.co.uk > 2009‐10
30/05/2011 Sistemi NoSQL 32
Article
Tags
What if the JSON API was the primary model?• CouchDB• MongoDB
What if the JSON API was the primary model?• CouchDB• MongoDB
![Page 33: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/33.jpg)
NoSQL Systems > MongoDB vs CouchDB
30/05/2011 Sistemi NoSQL 33
CouchDB MongoDB
Data Model Collections of JSON docs Collections of BSON docs
Queries Low‐level query language Rich, declarative query language
Consistency Model Eventual Consistency Strong Consistency (tunable though)
Replication Master‐Master Master‐Slave
Scalability Through replication Sharding
![Page 34: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/34.jpg)
NoSQL Systems > MongoDB vs CouchDB
30/05/2011 Sistemi NoSQL 34
CouchDB MongoDB
Data Model Collections of JSON docs Collections of BSON docs
Queries Low‐level query language Rich, declarative query language
Consistency Model Eventual Consistency Strong Consistency (tunable though)
Replication Master‐Master Master‐Slave
Scalability Through replication Sharding
• MongoDB was chosen:• Can easily express complex queries• Good if you come from RDBMS• No need for extreme scalability (where CouchDB shines)
![Page 35: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB](https://reader033.vdocument.in/reader033/viewer/2022042309/5ed6d52c126754677f6303d5/html5/thumbnails/35.jpg)
NoSQL Systems > Links and References
• Rick Cattel – Scalable SQL and NoSQL Datastores• R.Cattel, M.Stonebraker – Ten Rules for Scalable Performance in “Simple
Operation” Datastores• M.Stonebraker – SQL vs NoSQL Databases
• A.Popescu – MyNoSQL Blog
• Chang et al. – Google BigTable• DeCandia et al – Amazon Dynamo
• We have encountered:– Cassandra – cassandra.apache.org– Hbase ‐ hbase.apache.org– CouchDB ‐ couchdb.apache.org– MongoDB ‐ http://www.mongodb.org– Memcached ‐memcached.org
30/05/2011 Sistemi NoSQL 35