couchbase server 101: couchbase connect 2015
TRANSCRIPT
©2015 Couchbase Inc. 2
Agenda
Where does Couchbase fit in? Key Concepts Operations Cluster-wide operations Look at a Live Cluster
©2015 Couchbase Inc. 3
Big Data = Operational + Analytic (NoSQL + Hadoop)
Online Web/Mobile/IoT apps Millions of
customers/consumers
Offline, batch-oriented Analytics apps Hundreds of business
analysts
©2015 Couchbase Inc. 4
Couchbase meets today’s & tomorrow’s requirements
Flexible data model
Consistent performance at scale
High availability
Easy, affordable scalability
24x365
©2015 Couchbase Inc. 5
Enterprises use Couchbase to enable key objectives
360 Degree Customer
View
Profile Managemen
t
Catalog Fraud Detection
Content Managemen
t
Internet of Things
Digital Communicat
ion
Real Time Big Data
Mobile Applicatio
ns
Personalization
©2015 Couchbase Inc. 7
Couchbase can act as a
Key-Value Store Document Store
2014-06-23-10:15am : 75F
2014-06-23-11:30am : 77F
2014-06-23-02:00pm : 82F
0001:
{firstname: “Dipti”, lastname: “Borkar”, language: “English”, time_zone: “PST”, zip: 94403 }
Key - UTF-8 string up to 250 bytes
Value - can be 0 bytes – 20 MB (best practice < 1 MB)
©2015 Couchbase Inc. 8
Fundamentals
Similar to primary keys in relational databases Documents are partitioned based on the document ID ID based document lookup is extremely fast Must be unique
JSON Binary - integers, strings, booleans Common binary values include serialized objects, compressed XML,
compressed text, encrypted values
Document ID or Key
Value
CAS Value (unique identifier for concurrency) TTL Flags (optional client library metadata) Revision #
Metadata
©2015 Couchbase Inc. 9
Can Represent Complex Objects and Data Structures
Very simple notation, lightweight, compact, readable
The most common API return type for Integrations Facebook, Twitter, you name it, return JSON
Native to Javascript (can be useful)
Can be inserted straight into Couchbase (faster development)
Serialization and Deserialization are very fast
Benefits of JSON
©2014 Couchbase, Inc.©2015 Couchbase Inc. 10
Storing and retrieving documents
Couchbase Cluster
Server Nodes
User/application data
Which live on
Data Buckets
DocumentsRead from / Written to
That form a
Clients
Servers
Dynamically scalable
Based on hash partitioning
©2014 Couchbase, Inc.©2015 Couchbase Inc. 11
User Objectstring uid
string firstname
string lastname
int age
array favorite_colors
string email
u::[email protected]{ “uid”: 123456,
“firstname”: “John”,“lastname”: “Smith”,“age”: 22,“favorite_colors”: [“blue”, “black”],“email”: “[email protected]”
}
User Objectstring uid
string firstname
string lastname
int age
array favorite_colors
string email
u::[email protected]{ “uid”: 123456,
“firstname”: “John”,“lastname”: “Smith”,“age”: 22,“favorite_colors”: [“blue”, “black”],“email”: “[email protected]”
}
add()
get()
Objects Serialized to JSON and Back
©2015 Couchbase Inc. 12
Couchbase provides a complete Data Management solution
High availability
cache
Key-value store
Document
database
Embedded database
Sync management
Multi-purpose capabilities support a broad range of apps and use cases
Enterprises often start with cache, then broaden usage to other apps and use cases
©2015 Couchbase Inc. 13
What makes Couchbase unique?
Performance & scalability
leaderSub millisecond latency with high throughput; memory-centric architecture
Multi-purpose
Simplified administrationEasy to deploy & manage; integrated Admin Console, single-click cluster expansion & rebalance
Cache, key value store, document database, and local/mobile database in single platform
Always-on availability
Data replication across nodes, clusters, and data centers
Enterprises choose Couchbase for several key advantages
24x365
©2015 Couchbase Inc. 15
Couchbase Server Architecture
QueryEngine
Object-managed
Cache
Storage Engine
DATA MANAGER
11210 / 11211Data access ports
8092Query API
HTTP
REST management API/Web UI
Replication, Rebalance, Shard
State Manager
Erlang /OTP
CLUSTER MANAGER
8091Admin Console
©2015 Couchbase Inc. 16
Single Node Operations - Write
33 2Managed Cache
Dis
k Q
ueue
Disk
Replication Queue
App Server
Memory-to-Memory Replication to other node
Doc
Doc Doc
©2015 Couchbase Inc. 17
Managed Cache
Disk
Single Node Operations - Read
Managed Cache
Doc 1
Get Doc
1
Doc 1
Doc 1
App Server
Dis
k Q
ueue
Replication Queue
Memory-to-Memory Replication to other node
©2015 Couchbase Inc. 18
Disk
Managed Cache
Single Node Operations – Cache Ejection
Doc 1
Doc 1
Doc 2
Doc 3
Doc 4
Doc 5
Doc 6
Doc 2
Doc 3
Doc 4
Doc 5
Doc 6App Server
Dis
k Q
ueue
Replication Queue
Memory-to-Memory Replication to other node
©2015 Couchbase Inc. 19
Single Node Operations – Cache Miss
33 2
Dis
k Q
ueue
Disk
Replication Queue
App Server
Memory-to-Memory Replication to other node
Doc 1
Doc 2
Doc 3
Doc 4
Doc 5
Doc 6
Doc 2
Doc 3
Doc 4
Doc 5
Doc 6
Doc 1
Doc 1
Doc 1
Managed Cache
Get Doc
1
©2015 Couchbase Inc. 21
Auto sharding – Bucket and vBuckets
Each bucket has active and replica data sets Each data set has 1024 Virtual Bucket (vBuckets) Documents get logically mapped to vBuckets
Document IDs always get hashed to the same virtual bucket Virtual buckets to do not have a fixed physical server location Mapping between the virtual buckets and physical server is
called the cluster map Each virtual bucket contains 1/1024th portion of the data set
vB
Data buckets
vB
1 ….. 1024
Virtual buckets
©2014 Couchbase, Inc.©2015 Couchbase Inc. 22
Cluster Map
Hash function (KEY)
vB1 vB2 vB3 vB4 vB5 vB1024
Ph
ysic
al
serv
ers
A B C
Add node to scale out
Log
ical
Part
itio
ns
Cluster Map
New Cluster Map
DocumentsRead from / Written to
©2014 Couchbase, Inc.©2015 Couchbase Inc. 26
read/write/update
Active
SERVER 1
Active
SERVER 2
Active
SERVER 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Shard 5
Shard 2
Shard 9
Shard
Shard
Shard
Shard 4
Shard 7
Shard 8
Shard
Shard
Shard
Shard 1
Shard 3
Shard 6
Shard
Shard
Shard
Replica Replica Replica
Shard 4
Shard 1
Shard 8
Shard
Shard
Shard
Shard 6
Shard 3
Shard 2
Shard
Shard
Shard
Shard 7
Shard 9
Shard 5
Shard
Shard
Shard
Multi-Node Operations
26
• Docs distributed evenly across servers
• Each server stores both active and replica docs Only one server active at a time
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is on App never needs to know
• App reads, writes, updates docs
• Multiple app servers can access same document at same time
©2014 Couchbase, Inc.©2015 Couchbase Inc. 27
SERVER 4 SERVER 5
Replica
Active
Replica
Active
read/write/update
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Active
SERVER 1
Shard 9
Shard
Replica
Shard 4
Shard 1
Shard 8
Shard
Shard
Shard
Active
SERVER 2
Shard 8
Shard
Replica
Shard 6
Shard 3
Shard 2
Shard
Shard
Shard
Active
SERVER 3
Shard 6
Shard
Replica
Shard 7
Shard 9
Shard 5
Shard
Shard
Shard
read/write/update
Shard 5
Shard 2
Shard
Shard
Shard 4
Shard 7
Shard
Shard
Shard 1
Shard 3
Shard
Shard
Adding Nodes
27
• Two servers added withone-click operation
• Docs automatically rebalance across cluster Even distribution of docs Minimum doc movement
• Cluster map updated
• App database calls now distributed over larger number of servers
©2015 Couchbase Inc. 28
SERVER 4 SERVER 5
Replica
Active
Replica
ActiveActive
SERVER 1
Shard 5
Shard 2
Shard 9Shard
Shard
Shard
Replica
Shard 4
Shard 1
Shard 8Shard
Shard
Shard
Active
SERVER 2
Shard 4
Shard 7 Shard 8
Shard
Shard Shard
Replica
Shard 6
Shard 3 Shard 2
Shard
Shard Shard
Active
SERVER 3
Shard 1
Shard 3
Shard 6Shard
Shard
Shard
Replica
Shard 7
Shard 9
Shard 5Shard
Shard
Shard
• App servers accessing Shards
• Requests to Server 3 fail
• Cluster detects server failedo Promotes replicas of
Shards to activeo Updates cluster map
• Requests for docs now go to appropriate server
• Typically rebalance would follow
Shard 1 Shard 3
Shard
Managing failures
App Server 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
App Server 2
©2015 Couchbase Inc. 32
XDCR: Cross Data Center Replication Application can access both clusters (master – master) Scales out linearly Different from intra-cluster replication (“CP” versus “AP”)
©2015 Couchbase Inc. 35
XDCR: Flexible topologies One-one, one-many, many-one Differently sized and resourced clusters supported
©2015 Couchbase Inc. 36
33 2
XDCR after Write
Managed Cache
Dis
k Q
ueue
Disk
Replication Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
XDCR Queue
Doc 1
Doc 1
(New in 3.0) Memory-to-Memory Replication to remote cluster
Memory-to-Memory Replication to other node
©2014 Couchbase, Inc.©2015 Couchbase Inc. 37
Indexing and Querying Features
Index and Query Distributed indexing and querying Secondary indexes of JSON document content Flexible querying of indexes
Incremental Map-Reduce Distributed simple real-time analytics Only considers changes due to updated data
Full Text Search Robust integration with ElasticSearch / Solr cluster Flexible full text search and faceted search
©2015 Couchbase Inc. 38
33 2
View processing after write
Managed Cache
Dis
k Q
ueue
Disk
Replication Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
To other node
View engine Doc 1
Doc 1
©2014 Couchbase, Inc.©2015 Couchbase Inc. 39
Active
SERVER 1
Shard 5
Shard 2
Shard
Shard
Replica
Shard 4
Shard 1
Shard
Shard
Shard 1
Active
SERVER 3
Shard 5
Shard 2
Shard
Shard
Replica
Shard 4
Shard 1
Shard
Shard
Shard 1
Active
SERVER 2
Shard 5
Shard 2
Shard
Shard
Replica
Shard 4
Shard 1
Shard
Shard
Shard 1
APP SERVER 1COUCHBASE Client
LibraryCLUSTER
MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Couchbase Server Architecture - Views
• Indexing work is distributed amongst nodes
• Large data set possible
• Parallelize the effort
• Each node has index for data stored on it
• Queries combine the results from required nodes
©2015 Couchbase Inc. 43
Why SQL for NoSQL
JSON document model provides Rich Structure (no assembly) Structure Evolution (flexible schema, seamless change)
SQL provides Query across relationships Query in general
Why SQL for JSON? To address all these data concerns N1QL is SQL for JSON
©2015 Couchbase Inc. 44
Models for Representing Data
Data Concern Relational Model JSON Document Model (NoSQL)
Rich Structure Multiple flat tables Constant assembly and
disassembly
Documents No assembly required!
Relationships Represented Queried (SQL)
Represented Queried? Not so far…
Value Evolution
Data can be updated Data can be updated
Structure Evolution
Uniform and rigid Change is disruptive and
manual
Flexible Change is seamless and
data-driven
©2015 Couchbase Inc. 45
SELECT
Standard SELECT pipeline SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT, OFFSET
Queries across relationships JOINs Subqueries NEST — a JOIN that embeds child objects within their parent UNNEST — a JOIN that surfaces nested objects as top-level data
Aggregation Set operators
UNION, INTERSECT, EXCEPT
©2015 Couchbase Inc. 46
N1QL Architecture
Single node installation, services defined dynamically
Query service access Index and Data to formulate response
All queries and direct access is topology aware and dynamically scalable
©2015 Couchbase Inc. 47
Indexing
CREATE / DROP INDEX
Two types of indexes View indexes GSI indexes (global secondary indexes—new)
Can index any data expression Nested / complex expressions Computed expressions
EXPLAIN
©2015 Couchbase Inc. 48
Data writes*
UPDATE … WHERE … Partial updates; deep updates
DELETE … WHERE … Deeply nested conditions
INSERT … VALUES …; INSERT … SELECT … Bulk insert; transfer and transformation
MERGE INSERT or UPDATE; ETL support
*Single-document atomicity.