the nosql ecosystem

69
The NoSQL Ecosystem Adam Marcus MIT CSAIL [email protected] / @marcua

Upload: yarapavan

Post on 12-May-2015

1.193 views

Category:

Technology


0 download

DESCRIPTION

HPTS 2011 preso by Adam Marcus from CSAIL, MIT

TRANSCRIPT

Page 1: The NoSQL Ecosystem

The NoSQL Ecosystem

Adam MarcusMIT CSAIL

[email protected] / @marcua

Page 2: The NoSQL Ecosystem

About Me

● Social Computing + Database Systems● Easily Distracted: Wrote The NoSQL Ecosystem in

The Architecture of Open Source Applications 1

1 http://www.aosabook.org/en/nosql.html

Page 3: The NoSQL Ecosystem

History, Compressed

Late 1990s Today

Page 4: The NoSQL Ecosystem

History, Compressed

Late 1990s Today

Buy RAM

Page 5: The NoSQL Ecosystem

History, Compressed

Late 1990s

Wrap RDBMSs

Today

Buy RAM

Page 6: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM

Page 7: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM

Not Only SQL

Page 8: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM Commercialize

Page 9: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM Commercialize

Page 10: The NoSQL Ecosystem

The List, So You Don't Yell at Me

Cassandra

HBase

Voldemort

Riak

RedisMongoDB

HyperTable

Neo4j

HyperGraphDBDEX

InfoGrid

VertexDB

Sones

CouchDB

AllegroGraph

BerkeleyDB

Oracle NoSQLMemcacheDBTokyo Cabinet

FlockDB

Page 11: The NoSQL Ecosystem

The List, So You Don't Yell at Me

Cassandra

HBase

Voldemort

Riak

RedisMongoDB

HyperTable

Neo4j

HyperGraphDBDEX

InfoGrid

VertexDB

Sones

CouchDB

AllegroGraph

BerkeleyDB

Oracle NoSQLMemcacheDBTokyo Cabinet

FlockDB

Marcus' Law of Databases:

The number of persistence options doubles every 1.5 years

Page 12: The NoSQL Ecosystem

The List, So You Don't Yell at Me

Cassandra

HBase

Voldemort

Riak

RedisMongoDB

HyperTable

Neo4j

HyperGraphDBDEX

InfoGrid

VertexDB

Sones

CouchDB

AllegroGraph

BerkeleyDB

Oracle NoSQLMemcacheDBTokyo Cabinet

FlockDB

Marcus' Law of Databases:

The number of persistence options doubles every 1.5 years

Corollary:I'm sorry I missed your

persistence option

Page 13: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 14: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 15: The NoSQL Ecosystem

Conventional Wisdom on NoSQL

● Key-based data model● Sloppy schema● Single-key transactions● In-app joins● Eventually consistent

Page 16: The NoSQL Ecosystem

Exceptions are More Interesting

● Data Model● Query Model● Transactions● Consistency

Page 17: The NoSQL Ecosystem

Data Model

● Usually key-based...● Column-families● Documents● Data structures

Page 18: The NoSQL Ecosystem

Data Model

● Usually key-based...● Column-families● Documents● Data structures

● ...but not always● Graph stores

Page 19: The NoSQL Ecosystem

Query Model

● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no

joins or transactions

Page 20: The NoSQL Ecosystem

Query Model

● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no

joins or transactions● Third-party

● High-level: PigLatin, HiveQL● Library: Cascading, Crunch● Streaming: Flume, Kafka, S4, Scribe

Page 21: The NoSQL Ecosystem

Transactions

● Full ACID for single key● Redis: multi-key single-node transactions

Page 22: The NoSQL Ecosystem

Consistency

● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions

Page 23: The NoSQL Ecosystem

Consistency

● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions

● Dynamo: strong or eventual (quorum size)

Page 24: The NoSQL Ecosystem

Consistency

● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions

● Dynamo: strong or eventual (quorum size)

● PNUTs: Timeline consistency● ...many consistency models!

See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html

Page 25: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 26: The NoSQL Ecosystem

NoSQL Use-cases

● Cassandra● HBase● MongoDB

Page 27: The NoSQL Ecosystem

Cassandra

● BigTable data model: key column family● Dynamo sharding model: consistent hashing● Eventual or strong consistency

Page 28: The NoSQL Ecosystem

Cassandra at Netflix

● Transitioned from Oracle● Store customer profiles, customer:movie watch

log, and detailed usage logging

Note: no multi-record locking (e.g., bank transfer)

In-datacenter: 3 replicas, per-app consistency

read/write quorum = 1, ~1ms latency

read/write quorum = 2, ~3ms latency

“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra

Page 29: The NoSQL Ecosystem

Cassandra at Netflix

● Transitioned from Oracle● Store customer profiles, customer:movie watch

log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer)

In-datacenter: 3 replicas, per-app consistency

read/write quorum = 1, ~1ms latency

read/write quorum = 2, ~3ms latency

“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra

Page 30: The NoSQL Ecosystem

Cassandra at Netflix

● Transitioned from Oracle● Store customer profiles, customer:movie watch

log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer)

● In-datacenter: 3 replicas, per-app consistency

“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra

Page 31: The NoSQL Ecosystem

Cassandra at Netflix (cont'd)

● Benefit: async inter-datacenter replication● Benefit: no downtime for schema changes● Benefit: hooks for live backups

Page 32: The NoSQL Ecosystem

HBase

● Data model: key column family● Sharding model: range partitioning● Strong consistency

Applications

Logging events/crawls, storing analytics

Twitter: replicate data from MySQL, Hadoop analytics

Facebook Messages

Page 33: The NoSQL Ecosystem

HBase

● Data model: key column family● Sharding model: range partitioning● Strong consistency

● Applications● Logging events/crawls, storing analytics● Twitter: replicate data from MySQL, Hadoop analytics● Facebook Messages

Page 34: The NoSQL Ecosystem

HBase for Facebook Messages

● Cassandra/Dynamo eventual consistency was difficult to program against

Benefit: simple consistency model

Benefit: flexible data model

Benefit: simple sharding, load balancing, replication

Page 35: The NoSQL Ecosystem

HBase for Facebook Messages

● Cassandra/Dynamo eventual consistency was difficult to program against

● Benefit: simple consistency model● Benefit: flexible data model● Benefit: simple sharding, load balancing,

replication

Page 36: The NoSQL Ecosystem

MongoDB

● Document-based data model● Range-based partitioning● Consistency depends on how you use it

Page 37: The NoSQL Ecosystem

MongoDB: Two use-cases

● Archiving at Craigslist● 2.2B historical posts, semi-structured● Relatively large blobs: avg 2KB, max > 4 MB

Page 38: The NoSQL Ecosystem

MongoDB: Two use-cases

● Archiving at Craigslist● 2.2B historical posts, semi-structured● Relatively large blobs: avg 2KB, max > 4 MB

● Checkins at Foursquare● Geospatial indexing● Small location-based updates, sharded on user

Page 39: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 40: The NoSQL Ecosystem

Takeaways

● Developer accessibility● Ecosystem of reuse● Soft spot

Page 41: The NoSQL Ecosystem

Developer Accessibility

Page 42: The NoSQL Ecosystem

Developer Accessibilitydevops

@DEVOPS_BORAT

Page 43: The NoSQL Ecosystem

Developer Accessibilitydevops

@DEVOPS_BORAT

self-taught dev

Page 44: The NoSQL Ecosystem

A Different Five-Minute Rule

What does a first-time user of your system experience in their first five minutes? (idea credit: Justin Sheehy, Basho)

Page 45: The NoSQL Ecosystem

Redis

http://simonwillison.net/2009/Oct/22/redis/

Page 46: The NoSQL Ecosystem

PostgreSQL

http://www.postgresql.org/docs/9.1/interactive/sql-createtable.html

Page 47: The NoSQL Ecosystem

Cassandra

http://www.datastax.com/docs/0.7/getting_started/using_cli

Page 48: The NoSQL Ecosystem

Accessibility is Forever

Beyond five minutes: ● changing schemas● scaling up● modifying topology

Page 49: The NoSQL Ecosystem

Accessibility is Forever

Beyond five minutes: ● changing schemas● scaling up● modifying topology

An accessible data store will ease first-time users in, and support

experienced users as they expand

Page 50: The NoSQL Ecosystem

Ecosystem of Reuse

Page 51: The NoSQL Ecosystem

Spectrum of Reusability

Monolithic System System Kernel

Page 52: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

Use as provided

Monolithic System System Kernel

Page 53: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

CassandraVoldemort

Use as provided

Pick between components

Monolithic System System Kernel

Page 54: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

CassandraVoldemort

Use as provided

Pick between components

ZooKeeperLevelDB

Reusablecomponents

Monolithic System System Kernel

Page 55: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

CassandraVoldemort

Use as provided

Pick between components

riak_coreZooKeeperLevelDB

Reusablecomponents

Build yourown system

Monolithic System System Kernel

Page 56: The NoSQL Ecosystem

And finally, a soft spot

Page 57: The NoSQL Ecosystem

polyglot persistence=

right tool for right task

Page 58: The NoSQL Ecosystem

Polyglot persistence: Where's the data?

“HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley

Page 59: The NoSQL Ecosystem

Polyglot persistence: Where's the data?

“HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley

Page 60: The NoSQL Ecosystem

Polyglot persistence: Where's the data?

“HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley

Aid developers in reasoning about data consistency across multiple storage engines

Page 61: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 62: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?

Page 63: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?

Page 64: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?

Page 65: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?

Page 66: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?

Page 67: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores?

Page 68: The NoSQL Ecosystem

Thank You!

Adam [email protected] / @marcua

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores?

Page 69: The NoSQL Ecosystem

Photo Credits

http://www.flickr.com/photos/dhannah/4577657955/

http://www.flickr.com/photos/27316226@N02/3000888100/sizes/m/in/photostream/

http://www.texample.net/tikz/examples/valentine-heart/

http://voltdb.com/sites/default/files/VCE_button.png