london web performance-couchbase meetup
TRANSCRIPT
1
CouchbaseDistributed Document Database
Perry KrugSr. Solutions Architect
2
Company Background
• Leading NoSQL database company
• Open source development and distribution model
• Provide easy-to-develop and -deploy, high-performance, easily scalable, document database
• Focused on internet and mobile applications and cloud computing environments
• Most mature, reliable and widely deployed solution– 1000s of production deployments worldwide
• Located in Silicon Valley (Mountain View, CA)– 80 employees including 50 in engineering/product– Round C company, VCs includes Accel partners, Mayfield, North Bridge, and Ignition
3
Paid Production Deployments (partial list)
4
Couchbase automatically distributes data across commodity servers. Built-in caching enables apps to read and write data with sub-millisecond latency. And with no schema to manage,
Couchbase effortlessly accommodates changing data management requirements.
Couchbase Server (a.k.a. Membase)
Simple. Fast. Elastic. NoSQL.
5
RDBMS HAS DOMINATED FOR 40 YEARS BUT NO LONGER BEST SOLUTION FOR MANY APPS
Relational database technology has served us well for 40 years, and will likely continue to do so for the foreseeable future to support transactions requiring ACID guarantees. But a large, and increasingly dominant, class of software systems and data do not need those guarantees. Much of the data manipulated by Web applications have less strict transactional requirements but, for lack of a practical alternative, many IT teams continue to use relational technology, needlessly tolerating its cost and scalability limitations. For these applications and data, distributed document cache and database technologies such as Couchbase’s provide a promising alternative.
Carl OlofsonIDC Research Vice President, Information and Data Management
“
”
6
Modern Interactive Software Architecture
Application Scales OutJust add more commodity web servers
Database Scales UpGet a bigger, more complex server
Expensive & disruptive sharding, doesn’t perform at web scale
7
Data Layer Matches Application Logic Tier Architecture
Application Scales OutJust add more commodity web servers
Database Scales OutJust add more commodity data servers
Scaling out flattens the cost and performance curves
• Horizontally scalable with auto-sharding• High performance at web scale• Schema-less for flexibility
8
Couchbase NoSQL: Simple, Fast, Elastic
• Easily scale apps to an “infinite” number of users– Simply add nodes with a single click– Never need to change your application to scale– Simple development with memcached API
• High performance with predictably low latency– Sub millisecond reads and writes– No drop in performance as app scales
• Schema-less document database– Flexibility to meet rapidly changing market requirements– Roadmap: Indexing, querying similar to RDBMS capabilities
• Low cost solution that economically scales with app
9
PERFORMANCE
10
Key results of Cisco and Solarflare Benchmark
Couchbase Server demonstrates
• Consistent sub-millisecond latency for mixed workload
• High throughput
• Linear scalability
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf
11
Your secret weapon: Sub-millisecond AND consistent latency
Object size (Bytes)
Late
ncy (
mic
ro s
econ
ds)
Consistently low latencies in microseconds for varying documents sizes with a mixed workload
12
Your secret weapon: Sub-millisecond AND consistent latency
Number of servers in cluster
Op
era
tion
s p
er
secon
d
High throughput with 1.4 GB/sec data transfer rate using 4 servers
Linear throughput scalability
13
SCALE
14
Draw Something by OMGPOP
15
Draw Something “goes viral” 3 weeks after launch
191715131197533/12826242220181614121082/6
Draw Something by OMGPOPDaily Active Users (millions)
21
2
4
6
8
10
12
14
16
16
As usage grew, game data went non-linear.
191715131197533/12826242220181614121082/6
Draw Something by OMGPOPDaily Active Users (millions)
21
2
4
6
8
10
12
14
16
By March 19, there were over 30,000,000 downloads of the app,
over 5,000 drawings being stored per second,over 2,200,000,000 drawings stored,
over 105,000 database transactions per second,and over 3.3 terabytes of data stored. Instagram (7.5M in 5 wks)
17
The game exploded. But Couchbase did not.
Without a second of downtime, and while sustaining front-end performance, the cluster was continuously expanded to support growth, absorbing frequent server hardware failures.
Drawings/second
Total drawings
R/W latency (usec)
Servers
February
6February
13February
20February
27March
5March
12March
19
0 5 12 50 500 1 2.2million million million million billion billion
6 6 6 18 54 72 90
30 40 32 31 38 29 34
0 3 50 333 1660 3000 5400
18
In contrast.
191715131197533/12826242220181614121082/6
The Simpson’s: Tapped OutDaily Active Users (millions)
21
2
4
6
8
10
12
14
16
#2 Free app on iPad#3 Free app on iPhone
19
NO SCHEMA
20
Document database
{"_id": "brewery_Cleveland_ChopHouse_and_Brewery","_rev": "1-00000061480b50910000000000000000","city": "Cleveland","updated": "2010-07-22 20:00:20","code": "44113","name": "Cleveland ChopHouse and Brewery","country": "United States","phone": "1-216-623-0909","state": "Ohio","address": [
"824 West St.Clair Avenue"],"geo": {
"loc": ["-81.6994","41.4995"],"accuracy": "ROOFTOP"
},"$expiration": 0,"$flags": 0
}
• Json objects• Flexible schema
21
COUCHBASE SOLUTION“THE BASICS”
22
COUCHBASE CLIENT LIBRARY
Basic Operation – scale out
Docs distributed evenly across servers in the cluster
Each server stores both active & replica docs Only one server active at a time
Client library provides app with simple interface to database
Cluster map provides map to which server doc is on App never needs to know
App reads, writes, updates docs
Multiple App Servers can access same document at same time
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
User Configured Replica Count = 1
Read/Write/Update
COUCHBASE CLIENT LIBRARY
Read/Write/Update
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
CLUSTER MAP CLUSTER MAP
APP SERVER 1 APP SERVER 2
COUCHBASE SERVER CLUSTER
23
Add Nodes
Two servers added to cluster One-click operation
Docs automatically rebalanced across cluster Even distribution of
docs Minimum doc
movement Cluster map updated
App database calls now distributed over larger # of servers
User Configured Replica Count = 1
Read/Write/Update Read/Write/Update
Doc 7
Doc 9
Doc 3
Active Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
SERVER 4 SERVER 5
Active Docs Active Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
24
Fail Over Node
App servers happily accessing docs on Server 3
Server fails App server requests to server 3 fail Cluster detects server has failed
Promotes replicas of docs to active Updates cluster map
App server requests for docs now go to appropriate server
Typically rebalance would follow
User Configured Replica Count = 1
Doc 7
Doc 9
Doc 3
Active Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7 Doc 8
Doc 6
Doc 3
DOC
DOC
DOCDOC
DOC
DOC
DOC DOC
DOC
DOC
DOC DOC
DOC
DOC
DOC
Doc 9
Doc 5DOC
DOC
DOC
Doc 1
Doc 8
Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
SERVER 4 SERVER 5
Active Docs Active Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
25
Couchbase Server 2.0
• Next major release of Couchbase Server• Currently in Developer Preview, approaching Beta and GA.
What’s new:• New storage engine technology (Append only b-tree)• Indexing and Querying• Incremental Map Reduce• Cross Data Center Replication• Better memory management, large data sets, and other
technological improvments• Fully backwards compatible with existing Couchbase Server
26
storage interface
Hea
rtbe
at
Proc
ess
mon
itor
Glo
bal s
ingl
eton
sup
ervi
sor
Confi
gura
tion
man
ager
on each node
Reba
lanc
e or
ches
trat
or
Nod
e he
alth
mon
itor
one per cluster
vBuc
ket s
tate
and
repl
icati
on m
anag
er
httpRE
ST m
anag
emen
t API
/Web
UI
8092Couch View
CouchStoreAuto compaction
Memcached
Couchbase Server 2.0 Architecture
Couc
h AP
I
Membase EP Engine
11210Memcapable 2.0
Moxi
11211Memcapable 1.0
HTTP8091
Erlang port mapper4369
Distributed Erlang21100 - 21199
Membase
Erlang/OTP
Distributed Indexing
CouchBase
Cluster ManagerData Manager
27
storage interface
Hea
rtbe
at
Proc
ess
mon
itor
Glo
bal s
ingl
eton
sup
ervi
sor
Confi
gura
tion
man
ager
on each node
Reba
lanc
e or
ches
trat
or
Nod
e he
alth
mon
itor
one per cluster
vBuc
ket s
tate
and
repl
icati
on m
anag
er
httpRE
ST m
anag
emen
t API
/Web
UI
8092Couch View
CouchStoreAuto compaction
Couchbase Server 2.0 Architecture
Couc
h AP
I
Membase EP Engine
11210Memcapable 2.0
Moxi
11211Memcapable 1.0
HTTP8091
Erlang port mapper4369
Distributed Erlang21100 - 21199
Membase
Erlang/OTP
Distributed Indexing
CouchBase
Cluster Manager
Memcached Interface
28
storage interface
Hea
rtbe
at
Proc
ess
mon
itor
Glo
bal s
ingl
eton
sup
ervi
sor
Confi
gura
tion
man
ager
on each node
Reba
lanc
e or
ches
trat
or
Nod
e he
alth
mon
itor
one per cluster
vBuc
ket s
tate
and
repl
icati
on m
anag
er
httpRE
ST m
anag
emen
t API
/Web
UI
8092Couch View
CouchStoreAuto compaction
Memcached Interface
Couchbase Server 2.0 Architecture
Couc
h AP
I
EP Engine
11210Memcapable 2.0
Moxi
11211Memcapable 1.0
HTTP8091
Erlang port mapper4369
Distributed Erlang21100 - 21199
Erlang/OTP
Distributed Indexing
CouchBase
29
Partitioning The Data – vbucket map
30
Indexing and querying
• Build in incremental map reduce
• Map functions are written and executed on Java Script (V8)
• Index is built incrementally as mutation streams in
• Query in a scatter/gather fashion
31
Incremental Map reduce using javascript
{"_id": "brewery_Cleveland_ChopHouse_and_Brewery","_rev": "1-00000061480b50910000000000000000","city": "Cleveland","updated": "2010-07-22 20:00:20","code": "44113","name": "Cleveland ChopHouse and Brewery","country": "United States","phone": "1-216-623-0909","state": "Ohio","address": [
"824 West St.Clair Avenue"],"geo": {
"loc": ["-81.6994","41.4995"],"accuracy": "ROOFTOP"
},"$expiration": 0,"$flags": 0
}
• Document from our sample built in beer database
32
Map function
function (doc) { if (doc.country, doc.state, doc.city) { emit([doc.country, doc.state, doc.city], 1); } else if (doc.country, doc.state) { emit([doc.country, doc.state], 1); } else if (doc.country) { emit([doc.country], 1); }}
• Map functions
REST call: http://db1.couchbase.com:8092/beer-sample/_design/dev_beer/_view/by_location?limit=10
33
Reduce functions
• Built in reduce functions• _count • _sum• _stats ({“sum”: 1411, “count”: 1411, “min”: 1, “max”: 1, “sumsqr”:1411})
• Developing procedure• Develop against a subset of the data• Built the index on the entire cluster• Promote a dev_ view to production
34
APP SERVER 1
COUCHBASE CLIENT LIBRARY
Indexing and Querying
Indexing work is distributed amongst nodes Large data set possible Parallelize the effort
Each node has index for data stored on it
Queries combine the results from required nodes
CLUSTER MAP
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
User Configured Replica Count = 1
APP SERVER 2
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY
CLUSTER MAP CLUSTER MAP
APP SERVER 1 APP SERVER 2
QueryResponse
35
Cross Data Center Replication
Data close to users Multiple locations for disaster recovery Independently managed clusters serving local data
US DATA CENTER
EUROPE DATA CENTER
ASIA DATA CENTERReplication Replication
Replication
36
Integration to Analytics systems
Use the cross data center interface
Agnostic to topology changes De-duplication Effective changes feed of the
entire cluster
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
User Configured Replica Count = 1
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
COUCHBASE SERVER CLUSTER
CROSS DATA CENTER CONNETROR
Changes feed to consumed byAny destination
37
• Support large-scale analytics on application data by streaming data from Couchbase to Hadoop– Real-time integration using Flume– Batch integration using Sqoop
• Examples– Various game statistics (e.g., monthly / daily / hourly rankings)– Analyze game patterns from users to enhance various game metrics
Couchbase and Hadoop Integration
memcachedprotocol listener/sender
engine interface
Couchbase Storage Engine
TAPSqoop
38
Couchbase Client SDKs
Java Client SDK
.Net SDK
PHP SDK
Ruby SDK
Python SDK
spymemcachedConnection
HTTP couchDB connection
Java client API
User Code
Couchbase Server
CouchbaseClient cb = new CouchbaseClient(listURIs,"aBucket", "letmein");// this is all the same as beforecb.set("hello", 0, "world");cb.get("hello");Map<String, Object> manyThings =cb.getBulk(Collection<String> keys);/* accessing a view View view = cb.getView("design_document", "my_view");Query query = new Query();query.getRange("abegin", "theend");
http://www.couchbase.org/code
39
Couchbase Demonstration
• Couchbase ServerTemplate Demo– Starting with one database
node under load– Dynamically scaling to two
database nodes– Easy management and
monitoring– Not possible any other
database technology Couchbase Servers
In the EC2 or Datacenter
Web application server
Application user
41
COUCHBASE CUSTOMERS
42
Paid Production Deployments – Social Gaming
iki
43
Paid Production Deployments – Key Segments
Ad Platforms
Social Networks
44
Production Deployments – Key Industry Segments
Online Biz Services
Online Media
E-Commerce
45
Production Deployments – Key Industry Segments
HealthCare
Military/Government
Communications
46
Production Deployments – Key Industry Segments
Online Education
Web Design
FinancialServices
47
Production Deployments – Key Industry Segments
Software
Security
48
Production Deployments – Enterprises