ccb12 navigating the nosql ladnscape

44
1 Navigating the NoSQL Landscape Tugdual “tug” Gral Technology Evangel @tgrall

Upload: couchbase

Post on 10-May-2015

483 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CCB12 Navigating the NoSQL Ladnscape

1

Navigating the NoSQL Landscape

Tugdual “tug” GrallTechnology Evangelist@tgrall

Page 2: CCB12 Navigating the NoSQL Ladnscape

2

What we’ll talk about

• Why RDBMS are not enough?

• What are the different NoSQL taxonomies?

• Which “NoSQL” is right for me?

Page 3: CCB12 Navigating the NoSQL Ladnscape

3

Growth is the New Reality

• Instagram gained nearly 1 million users overnight when they expanded to Android

Page 4: CCB12 Navigating the NoSQL Ladnscape

4

Does it work with RDBMS backend?

Application Scales OutJust add more commodity web servers

Database Scales UpGet a bigger, more complex server

Note – Relational database technology is great for what it is great for, but it is not great for this.

Page 5: CCB12 Navigating the NoSQL Ladnscape

5

Some alternative to scale out your RDBMS

Scale out your RDBMS• Run many SQL Servers• Data are sharded

(most of the time using client code)• Memcached for faster response time

Page 6: CCB12 Navigating the NoSQL Ladnscape

6

Scale Out with RDBMS

Is this a good approach to scale?• Lot of components to deploy• Scale by Hand

– Caching– Sharding/Replication

Learn From Others This Scenario Costs Time and Money. Scaling SQL is potentially disastrous when going Viral: Very risky time for major code changes and migrations...You have no Time when skyrocketing up!

Page 7: CCB12 Navigating the NoSQL Ladnscape

7

Lacking market solutions, users forced to invent

DynamoOctober 2007

CassandraAugust 2008

VoldemortFebruary 2009

BigtableNovember 2006

Very few organizations want to (fewer can) build and maintain database software technology.But every organization building interactive web applications needs this technology.

• No schema required before inserting data• No schema change required to change data format• Auto-sharding without application participation• Distributed queries• Integrated main memory caching• Data synchronization (mobile, multi-datacenter)

Page 8: CCB12 Navigating the NoSQL Ladnscape

8

Other

All of these

Costs

High latency/low performance

Inability to scale out data

Lack of flexibility/rigid schemas

11%

12%

16%

29%

35%

49%

Source: Couchbase NoSQL Survey, December 2011, n=1351

What is the biggest data management problem driving your use of NoSQL in the coming year?

Survey: Schema inflexibility #1 adoption driver

Page 9: CCB12 Navigating the NoSQL Ladnscape

9

NoSQL database matches application logic tier architectureData layer now scales with linear cost and constant performance.

Application Scales OutJust add more commodity web servers

Database Scales OutJust add more commodity data servers

Scaling out flattens the cost and performance curves.

NoSQL Database Servers

Page 10: CCB12 Navigating the NoSQL Ladnscape

10

NOSQL TAXONOMY

Page 11: CCB12 Navigating the NoSQL Ladnscape

11

The Key-Value Store – the foundation of NoSQL

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Page 12: CCB12 Navigating the NoSQL Ladnscape

12

Memcached – the NoSQL precursor

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Memcached

In-memory onlyLimited set of operationsBlob Storage: Set, Add, Replace, CASRetrieval: GetStructured Data: Append, Increment

“Simple and fast.”

Challenges: cold cache, disruptive elasticity

Page 13: CCB12 Navigating the NoSQL Ladnscape

13

NoSQL catalog

Key-Value

Memcached

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 14: CCB12 Navigating the NoSQL Ladnscape

14

Redis – More “Structured Data” commands

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

“Data Structures”BlobListSet

Hash…

Redis

Disk Persistence (eventual consistency on the disk)Vast set of operationsBlob Storage: Set, Add, Replace, CASRetrieval: Get, Pub-SubStructured Data: Strings, Hashes, Lists, Sets,Sorted listsExample operations for a SetAdd, count, subtract sets, intersection, is member, atomic move from one set to another

Page 15: CCB12 Navigating the NoSQL Ladnscape

15

NoSQL catalog

Key-Value

Memcached Redis

Data Structure

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 16: CCB12 Navigating the NoSQL Ladnscape

16

Membase – From key-value cache to database

Disk-based with built-in memcached cacheCache refill on restartMemcached compatible (drop in replacement)Highly-available (data replication)Add or remove capacity to live cluster

“Simple, fast, elastic.”

MembaseKey

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Page 17: CCB12 Navigating the NoSQL Ladnscape

17

NoSQL catalog

Key-Value

Memcached

Membase

Redis

Data Structure

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 18: CCB12 Navigating the NoSQL Ladnscape

18

Couchbase – document-oriented database

Key

{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}

Auto-shardingDisk-based with built-in memcached cacheCache refill on restartMemcached compatible (drop in replace)Highly-available (data replication)Add or remove capacity to live cluster

When values are JSON objects (“documents”):Create indices, views and query against the views

JSONOBJECT

(“DOCUMENT”)

Couchbase

Page 19: CCB12 Navigating the NoSQL Ladnscape

19

NoSQL catalog

Key-Value

Memcached

Membase

Redis

Data Structure Document

Couchbase

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 20: CCB12 Navigating the NoSQL Ladnscape

20

MongoDB – Document-oriented database

Key

{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}

Disk-based with in-memory “caching”BSON (“binary JSON”) format and wire protocolMaster-slave replicationAuto-shardingValues are BSON objectsSupports ad hoc queries – best when indexed

BSONOBJECT

(“DOCUMENT”)

MongoDB

Page 21: CCB12 Navigating the NoSQL Ladnscape

21

NoSQL catalog

Key-Value

Memcached

Membase

Redis

Data Structure Document

MongoDB

Couchbase

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 22: CCB12 Navigating the NoSQL Ladnscape

22

Cassandra – Column overlays

Disk-based systemClustered External caching required for low-latency reads“Columns” are overlaid on the dataNot all rows must have all columnsSupports efficient queries on columnsRestart required when adding columns

CassandraKey

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Column 1

Column 2

Column 3 (not present)

Page 23: CCB12 Navigating the NoSQL Ladnscape

23

NoSQL catalog

Key-Value

Memcached

Membase

Redis

Data Structure Document Column

MongoDB

Couchbase Cassandra

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Page 24: CCB12 Navigating the NoSQL Ladnscape

24

Neo4j – Graph database

Disk-based systemExternal caching required for low-latency readsNodes, relationships and pathsProperties on nodesDelete, Insert, Traverse, etc.

Neo4j

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key

101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Page 25: CCB12 Navigating the NoSQL Ladnscape

25

NoSQL catalog

Key-Value

Memcached

Membase

Redis

Data Structure Document Column Graph

MongoDB

Couchbase Cassandra

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Neo4j

Page 26: CCB12 Navigating the NoSQL Ladnscape

26

NoSQL catalog

Key-Value

Memcached

Membase

Redis

Data Structure Document Column Graph

MongoDB

Couchbase Cassandra

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Neo4j

HBase InfiniteGraph

Coherence

Page 27: CCB12 Navigating the NoSQL Ladnscape

27

Page 28: CCB12 Navigating the NoSQL Ladnscape

28

What about Hadoop?

Page 29: CCB12 Navigating the NoSQL Ladnscape

29

Hadoop : Big Data Swiss Army Knife

• Oozie: Workflow, coordination• Sqoop : Data connector to import/export data• Hive : SQL-Like interface• Pig : High level programming language• Mahout : Machine learning library• Whirr : Hadoop management tools for cloud services• Flume : Aggregator• Map Reduce : Framework to process large volume of data• HBase : Key Value data store• Zookeeper : Centralized configuration management• HDFS : Distributed file system

Page 30: CCB12 Navigating the NoSQL Ladnscape

30

So what? Hadoop & Couchbase

click streamevents

profiles, campaigns

profiles, real time campaign statistics

40 milliseconds to respond with the decision.

2

3

1

Page 31: CCB12 Navigating the NoSQL Ladnscape

31

WHICH ONE IS RIGHT FOR ME ?

Page 32: CCB12 Navigating the NoSQL Ladnscape

32

Other

All of these

Costs

High latency/low performance

Inability to scale out data

Lack of flexibility/rigid schemas

11%

12%

16%

29%

35%

49%

Source: Couchbase NoSQL Survey, December 2011, n=1351

What is the biggest data management problem driving your use of NoSQL in the coming year?

Survey: Schema inflexibility #1 adoption driver

Page 33: CCB12 Navigating the NoSQL Ladnscape

33

Lack of Flexibility / Rigid Schema

• Aggregate Data Models (Martin Fowler)– Flexible Data Structure– Optimized Access– Easy to distribute data

o::1001

{uid: ji22jd,customer: Ann,line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ],payment: { type: Amex, expiry: 04/2001,

last5: 12345 }}

http://martinfowler.com/bliki/AggregateOrientedDatabase.html

Page 34: CCB12 Navigating the NoSQL Ladnscape

34

Use Cases

Key Value • Session Management• User Profile/Preferences• Shopping Cart

Document • Event Logging• Content Management • Web Analytics• E-Commerce Application

Columns • Event Logging• Content Management• Counters

Graph • Connected Data / Social Networks• Routing, Dispatch• Recommendations based on Social Graph

Thanks to Martin Fowler

Page 35: CCB12 Navigating the NoSQL Ladnscape

35

Production Environment

US DATA CENTER

EMEA DC

APAC DC

Page 36: CCB12 Navigating the NoSQL Ladnscape

36

Scale out your data

• Modify cluster topology should be simple– Add, Remove, Configure Nodes on a running system

• What is the impact of topology changes?– Sharding, Caching of the data– Availability of the service during cluster changes

• More hardware = More failures– Availability, reliability of the system: failover support

Page 37: CCB12 Navigating the NoSQL Ladnscape

37

Add Nodes

Two servers added to cluster One-click operation

Docs automatically rebalanced across cluster Even distribution of

docs Minimum doc

movement Cluster map updated

App database calls now distributed over larger # of servers

User Configured Replica Count = 1

Read/Write/Update Read/Write/Update

Doc 7

Doc 9

Doc 3

Active Docs

Replica Docs

Doc 6

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 1

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 2

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

SERVER 4 SERVER 5

Active Docs Active Docs

Replica Docs Replica Docs

COUCHBASE SERVER CLUSTER

Page 38: CCB12 Navigating the NoSQL Ladnscape

38

Fail Over Node

App servers happily accessing docs on Server 3

Server fails App server requests to server 3 fail Cluster detects server has failed

Promotes replicas of docs to active Updates cluster map

App server requests for docs now go to appropriate server

Typically rebalance would follow

User Configured Replica Count = 1

Doc 7

Doc 9

Doc 3

Active Docs

Replica Docs

Doc 6

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 1

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 2

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7 Doc 8

Doc 6

Doc 3

DOC

DOC

DOCDOC

DOC

DOC

DOC DOC

DOC

DOC

DOC DOC

DOC

DOC

DOC

Doc 9

Doc 5DOC

DOC

DOC

Doc 1

Doc 8

Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

SERVER 4 SERVER 5

Active Docs Active Docs

Replica Docs Replica Docs

COUCHBASE SERVER CLUSTER

Page 39: CCB12 Navigating the NoSQL Ladnscape

39

Performance

• What is my working set?• How cache is working?

– Put your data in RAM

• How to design my data model?– Aggregate Model– Easy to change

Page 40: CCB12 Navigating the NoSQL Ladnscape

40

Page 41: CCB12 Navigating the NoSQL Ladnscape

41

Management and Monitoring

• Do not forget about Operations!– Service Reliability Engineering Team will thank you!

• Manage your cluster easily:– Command Line, Administration Console to change cluster

toplogy

• Monitor “your NoSQL”– Analyze the overall status of your cluster– View and fix bottlenecks

Page 42: CCB12 Navigating the NoSQL Ladnscape

42

Conclusion

• One Size Does Not Fit All

• Overview of the the NoSQL types

• Choose the right solution– Developer Productivity– Large Scale Data

Page 43: CCB12 Navigating the NoSQL Ladnscape

43

QUESTIONS?

Page 44: CCB12 Navigating the NoSQL Ladnscape

44

Couchbase automatically distributes data across commodity servers. Built-in caching enables apps to read and write data with sub-millisecond latency. And with no schema to manage,

Couchbase effortlessly accommodates changing data management requirements.

Simple. Fast. Elastic. NoSQL.