nosql session gluecon may 2010
DESCRIPTION
Overview of NoSQL at GlueCon. Talk given by Dwight from 10gen/MongoDB.TRANSCRIPT
NoSQL : Channeling the Data Explosion
Dwight MerrimanCEO, 10gen
@dmerr dmerr.tumblr.com
GlueCon 2010
The database world is changingNo longer one-size-fits-all
NoSQL = Non-relational next generation operation data stores
and databases
Scaling Out
no joins +light transactional semantics = horizontally scalable architectures
Why?
http://www.globalnerdy.com/2007/09/07/multicore-musings/
cloud
commodity
How the NoSQL Products Vary
• What’s the same– No joins– No complex transactions
• What varies– Scale-out model– Consistency model– Data model
Scaling Out
distribution & query models
Consistent hashing
Order preserving range chunking
Scatter gather
Data models
no joins +light transactional semantics = horizontally scalable architectures
Important side effect : new data models = improved ways to develop apps
Data Models
• Key/value• Column-oriented “bigtable-style”• Document-oriented (JSON)
Data Models
{ title: ‘Too Big to Fail’, author: ‘John S’, ts: Date(“05-Nov-09 10:33”), comments: [ { author: 'Ian White', comment: 'Great article!' }, { author: 'Joe Smith', comment: 'But how fast is it?', replies: [ {author: 'Jane Smith', comment: 'scalable?'} ] } ] ], tags: [‘finance’, ‘economy’]}
{ title: ‘Too Big to Fail’, author: ‘John S’, ts: Date(“05-Nov-09 10:33”), comments: [ { author: 'Ian White', comment: 'Great article!' }, { author: 'Joe Smith', comment: 'But how fast is it?', replies: [ {author: 'Jane Smith', comment: 'scalable?'} ] } ] ], tags: [‘finance’, ‘economy’]}
db.posts.find( { tags : ‘economy’ } ) .sort({ts:-1}).limit(10).skip(10)
db.posts.find( { “comments.author” : “Ian White” } )
Influences
CAP
It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:• Availability• Atomic consistency in all fair executions (including those in which messages are lost).
Consistency Models - CAP
• Choices are AP or CP• Write Availability, not Read Availability, is the
Main Question• It’s not all about CAP
Eventual consistency makes these non-availability aspects better:• Multi data center• Speed• Even load distribution
Eventual Consistency
Eventual Consistency
Read(x) : 1, 2, 2, 4, 4, 4, 4 …
Could we get this?
Read(x) : 1, 2, 1, 4, 2, 4, 4, 4 …
Terms
• R• W• N– R+W>N has nice properties
• Sloppy quorum
R+W>N
If R+W > N, we can’t have both fast local reads and writes at the same time if all the data centers are equal peers?
Network Partitions
Trivial Network Partitions
Sometimes we need global state / more consistency
• Unique key constraints– User registration
• ACL changes• Are we surprising the user?– read-your-own-writes
Could it be the case that…
uptime( CP + average developer ) >= uptime( AP + average developer )
where uptime:= system is up and non-buggy?
Predictions
• JSON will be the most popular building block for non-relational data models
• Tunable consistency in all the products• Some SQL in these products!
Questions?Thank you
[email protected]@dmerrdmerr.tumblr.com@mongodbDownload : www.mongodb.org10gen is hiring in SF and NYC – [email protected]