couchbase 101
TRANSCRIPT
COUCHBASE 101
Dipti BorkarHead, WW Solutions Engineering
©2015 Couchbase Inc. 2
Agenda
Where does Couchbase fit in? Key Concepts Operations Cluster-wide operations Look at a Live Cluster
©2015 Couchbase Inc. 3
Big Data = Operational + Analytic (NoSQL + Hadoop)
Online
Web/Mobile/IoT apps
Millions of customers/consumers
Offline, batch-oriented
Analytics apps
Hundreds of business analysts
©2015 Couchbase Inc. 4
Couchbase meets today’s & tomorrow’s requirements
Flexible data model
Consistent performance at scale
High availability
Easy, affordable scalability
24x365
©2015 Couchbase Inc. 5
Enterprises use Couchbase to enable key objectives
360 Degree Customer View
Profile Management
Catalog Fraud Detection
Content Management
Internet of Things
Digital Communication
Real Time Big Data
Mobile Applications
Personalization
Key Concepts
6
©2015 Couchbase Inc. 7
Couchbase can act as a
Key-Value Store Document Store
2014-06-23-10:15am : 75F
2014-06-23-11:30am : 77F
2014-06-23-02:00pm : 82F
0001:
{firstname: “Dipti”,
lastname: “Borkar”,
language: “English”,
time_zone: “PST”,
zip: 94403
}
Key - UTF-8 string up to 250 bytes
Value - can be 0 bytes – 20 MB (best practice < 1 MB)
©2015 Couchbase Inc. 8
Fundamentals
Similar to primary keys in relational databases Documents are partitioned based on the document ID ID based document lookup is extremely fast Must be unique
JSON
Binary - integers, strings, booleans
Common binary values include serialized objects, compressed XML, compressed text, encrypted values
Document ID or Key
Value
CAS Value (unique identifier for concurrency)
TTL
Flags (optional client library metadata)
Revision #
Metadata
©2015 Couchbase Inc. 9
Can Represent Complex Objects and Data Structures
Very simple notation, lightweight, compact, readable
The most common API return type for Integrations
Facebook, Twitter, you name it, return JSON
Native to Javascript (can be useful)
Can be inserted straight into Couchbase (faster development)
Serialization and Deserialization are very fast
Benefits of JSON
©2015 Couchbase Inc. 10
Storing and retrieving documents
©2014 Couchbase, Inc.
Couchbase Cluster
Server Nodes
User/application data
Which live on
Data Buckets
DocumentsRead from / Written to
That form a
Clients
Servers
Dynamically scalable
Based on hash partitioning
©2015 Couchbase Inc. 11
User Objectstring uid
string firstname
string lastname
int age
array favorite_colors
string email
u::[email protected]{ “uid”: 123456,
“firstname”: “John”,“lastname”: “Smith”,“age”: 22,“favorite_colors”: [“blue”, “black”],“email”: “[email protected]”
}
User Objectstring uid
string firstname
string lastname
int age
array favorite_colors
string email
u::[email protected]{ “uid”: 123456,
“firstname”: “John”,“lastname”: “Smith”,“age”: 22,“favorite_colors”: [“blue”, “black”],“email”: “[email protected]”
}
add()
get()
Objects Serialized to JSON and Back
©2014 Couchbase, Inc.
©2015 Couchbase Inc. 12
Couchbase provides a complete Data Management solution
High availability cache
Key-value store
Document database
Embedded database
Sync management
Multi-purpose capabilities support a broad range of apps and use cases
Enterprises often start with cache, then broaden usage to other apps and use cases
©2015 Couchbase Inc. 13
What makes Couchbase unique?
Performance & scalability leader
Sub millisecond latency with high throughput; memory-centric architecture
Multi-purpose
Simplified administration
Easy to deploy & manage; integrated Admin Console, single-click cluster expansion & rebalance
Cache, key value store, document database, and local/mobile database in single platform
Always-on availability
Data replication across nodes, clusters, and data centers
Enterprises choose Couchbase for several key advantages
24x365
Operations
©2015 Couchbase Inc. 15
Couchbase Server Architecture
QueryEngine
Object-managed
Cache
Storage Engine
DATA MANAGER
11210 / 11211Data access ports
8092Query API
HTTP
REST management API/Web UI
Replication, Rebalance, Shard State Manager
Erlang /OTP
CLUSTER MANAGER
8091Admin Console
©2015 Couchbase Inc. 16
Single Node Operations - Write
33 2Managed Cache
Dis
k Q
ueu
e
Disk
Replication Queue
App Server
Memory-to-Memory Replication to other node
Doc
Doc Doc
©2015 Couchbase Inc. 17
Managed Cache
Disk
Single Node Operations - Read
Managed Cache
Doc 1
Get Doc 1
Doc 1Doc 1
App Server
Dis
k Q
ueu
e
Replication Queue
Memory-to-Memory Replication to other node
©2015 Couchbase Inc. 18
Disk
Managed Cache
Single Node Operations – Cache Ejection
Doc 1
Doc 1
Doc 2Doc 3Doc 4Doc 5Doc 6
Doc 2Doc 3Doc 4Doc 5Doc 6App Server
Dis
k Q
ueu
e
Replication Queue
Memory-to-Memory Replication to other node
©2015 Couchbase Inc. 19
Single Node Operations – Cache Miss
33 2
Dis
k Q
ueu
e
Disk
Replication Queue
App Server
Memory-to-Memory Replication to other node
Doc 1
Doc 2Doc 3Doc 4Doc 5Doc 6
Doc 2Doc 3Doc 4Doc 5Doc 6
Doc 1
Doc 1Doc 1
Managed Cache
Get Doc 1
Cluster-wide Operations
©2015 Couchbase Inc. 21
Auto sharding – Bucket and vBuckets
Each bucket has active and replica data sets
Each data set has 1024 Virtual Bucket (vBuckets)
Documents get logically mapped to vBuckets
Document IDs always get hashed to the same virtual bucket
Virtual buckets to do not have a fixed physical server location
Mapping between the virtual buckets and physical server is called the cluster map
Each virtual bucket contains 1/1024th portion of the data set
vB
Data buckets
vB
1 ….. 1024
Virtual buckets
©2015 Couchbase Inc. 22
Cluster Map
©2014 Couchbase, Inc.
Hash function (KEY)
vB1 vB2 vB3 vB4 vB5 vB1024
Ph
ysi
cal
serv
ers
A B C
Add node to scale out
Lo
gic
al
Pa
rtit
ion
s
Cluster Map
New Cluster Map
DocumentsRead from / Written to
©2015 Couchbase Inc. 23
Cluster Map
©2015 Couchbase Inc. 24
Cluster Map
©2015 Couchbase Inc. 25
Cluster Map – 2 nodes added
©2015 Couchbase Inc. 26
read/write/update
Active
SERVER 1
Active
SERVER 2
Active
SERVER 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Shard
5
Shard
2
Shard
9
Shard
Shard
Shard
Shard
4
Shard
7
Shard
8
Shard
Shard
Shard
Shard
1
Shard
3
Shard
6
Shard
Shard
Shard
Replica Replica Replica
Shard
4
Shard
1
Shard
8
Shard
Shard
Shard
Shard
6
Shard
3
Shard
2
Shard
Shard
Shard
Shard
7
Shard
9
Shard
5
Shard
Shard
Shard
Multi-Node Operations
©2014 Couchbase, Inc. 26
• Docs distributed evenly across servers
• Each server stores both active and replica docs- Only one server active at a time
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is on- App never needs to know
• App reads, writes, updates docs
• Multiple app servers can access same document at same time
©2015 Couchbase Inc. 27
SERVER 4 SERVER 5
Replica
Active
Replica
Active
read/write/update
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Active
SERVER 1
Shard
9
Shard
Replica
Shard
4
Shard
1
Shard
8
Shard
Shard
Shard
Active
SERVER 2
Shard
8
Shard
Replica
Shard
6
Shard
3
Shard
2
Shard
Shard
Shard
Active
SERVER 3
Shard
6
Shard
Replica
Shard
7
Shard
9
Shard
5
Shard
Shard
Shard
read/write/update
Shard
5
Shard
2
Shard
Shard
Shard
4
Shard
7
Shard
Shard
Shard
1
Shard
3
Shard
Shard
Adding Nodes
©2014 Couchbase, Inc. 27
• Two servers added withone-click operation
• Docs automatically rebalance across cluster- Even distribution of docs- Minimum doc movement
• Cluster map updated
• App database calls now distributed over larger number of servers
©2015 Couchbase Inc. 28
SERVER 4 SERVER 5
Replica
Active
Replica
ActiveActive
SERVER 1
Shard 5
Shard 2
Shard 9Shard
Shard
Shard
Replica
Shard 4
Shard 1
Shard 8Shard
Shard
Shard
Active
SERVER 2
Shard 4
Shard 7 Shard 8
Shard
Shard Shard
Replica
Shard 6
Shard 3 Shard 2
Shard
Shard Shard
Active
SERVER 3
Shard 1
Shard 3
Shard 6Shard
Shard
Shard
Replica
Shard 7
Shard 9
Shard 5Shard
Shard
Shard
• App servers accessing Shards
• Requests to Server 3 fail
• Cluster detects server failedo Promotes replicas of
Shards to activeo Updates cluster map
• Requests for docs now go to appropriate server
• Typically rebalance would follow
Shard 1 Shard 3
Shard
Managing failures
App Server 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
App Server 2
A look at a live cluster
Cross Data Center Replication
XDCR
©2015 Couchbase Inc. 31
Market leading memory-to-memory replication
New York
San Francisco
©2015 Couchbase Inc. 32
XDCR: Cross Data Center Replication
Application can access both clusters (master – master) Scales out linearly Different from intra-cluster replication (“CP” versus “AP”)
©2015 Couchbase Inc. 35
XDCR: Flexible topologies
One-one, one-many, many-one Differently sized and resourced clusters supported
©2015 Couchbase Inc. 36
33 2
XDCR after Write
Managed Cache
Dis
k Q
ueu
e
Disk
Replication Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
XDCR Queue
Doc 1Doc 1
(New in 3.0) Memory-to-Memory Replication to remote cluster
Memory-to-Memory Replication to other node
©2015 Couchbase Inc. 37
Indexing and Querying Features
©2014 Couchbase, Inc.
Index and Query Distributed indexing and querying Secondary indexes of JSON document content Flexible querying of indexes
Incremental Map-Reduce Distributed simple real-time analytics Only considers changes due to updated data
Full Text Search Robust integration with ElasticSearch / Solr cluster Flexible full text search and faceted search
©2015 Couchbase Inc. 38
33 2
View processing after write
Managed Cache
Dis
k Q
ueu
e
Disk
Replication Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
To other node
View engine Doc 1Doc 1
©2015 Couchbase Inc. 39
Active
SERVER 1
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
Active
SERVER 3
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
Active
SERVER 2
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Couchbase Server Architecture - Views
©2014 Couchbase, Inc.
• Indexing work is distributed amongst nodes
• Large data set possible
• Parallelize the effort
• Each node has index for data stored on it
• Queries combine the results from required nodes
©2015 Couchbase Inc. 40
Couchbase Elastic Search Connector
©2015 Couchbase Inc. 41
Couchbase Solr Connector
N1QLWhy SQL for NoSQL?
©2015 Couchbase Inc. 43
Why SQL for NoSQL
JSON document model provides Rich Structure (no assembly) Structure Evolution (flexible schema, seamless change)
SQL provides Query across relationships Query in general
Why SQL for JSON? To address all these data concerns N1QL is SQL for JSON
©2015 Couchbase Inc. 44
Models for Representing Data
Data Concern Relational Model JSON Document Model (NoSQL)
Rich Structure
Multiple flat tables Constant assembly and
disassembly
Documents No assembly required!
Relationships Represented Queried (SQL)
Represented Queried? Not so far…
Value Evolution Data can be updated Data can be updated
Structure Evolution Uniform and rigid Change is disruptive and
manual
Flexible Change is seamless and data-
driven
©2015 Couchbase Inc. 45
SELECT
Standard SELECT pipeline
SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT, OFFSET
Queries across relationships
JOINs
Subqueries
NEST — a JOIN that embeds child objects within their parent
UNNEST — a JOIN that surfaces nested objects as top-level data
Aggregation
Set operators
UNION, INTERSECT, EXCEPT
©2015 Couchbase Inc. 46
N1QL Architecture
Single node installation, services defined dynamically
Query service access Index and Data to formulate response
All queries and direct access is topology aware and dynamically scalable
©2015 Couchbase Inc. 47
Indexing
CREATE / DROP INDEX
Two types of indexes View indexes GSI indexes (global secondary indexes—new)
Can index any data expression Nested / complex expressions Computed expressions
EXPLAIN
©2015 Couchbase Inc. 48
Data writes*
UPDATE … WHERE … Partial updates; deep updates
DELETE … WHERE … Deeply nested conditions
INSERT … VALUES …; INSERT … SELECT … Bulk insert; transfer and transformation
MERGE INSERT or UPDATE; ETL support
*Single-document atomicity.
Q & AThank you.
[email protected]@dborkar