couchbase 101

47
COUCHBASE 101 Dipti Borkar Head, WW Solutions Engineering

Upload: dipti-borkar

Post on 21-Jan-2018

101 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Couchbase 101

COUCHBASE 101

Dipti BorkarHead, WW Solutions Engineering

Page 2: Couchbase 101

©2015 Couchbase Inc. 2

Agenda

Where does Couchbase fit in? Key Concepts Operations Cluster-wide operations Look at a Live Cluster

Page 3: Couchbase 101

©2015 Couchbase Inc. 3

Big Data = Operational + Analytic (NoSQL + Hadoop)

Online

Web/Mobile/IoT apps

Millions of customers/consumers

Offline, batch-oriented

Analytics apps

Hundreds of business analysts

Page 4: Couchbase 101

©2015 Couchbase Inc. 4

Couchbase meets today’s & tomorrow’s requirements

Flexible data model

Consistent performance at scale

High availability

Easy, affordable scalability

24x365

Page 5: Couchbase 101

©2015 Couchbase Inc. 5

Enterprises use Couchbase to enable key objectives

360 Degree Customer View

Profile Management

Catalog Fraud Detection

Content Management

Internet of Things

Digital Communication

Real Time Big Data

Mobile Applications

Personalization

Page 6: Couchbase 101

Key Concepts

6

Page 7: Couchbase 101

©2015 Couchbase Inc. 7

Couchbase can act as a

Key-Value Store Document Store

2014-06-23-10:15am : 75F

2014-06-23-11:30am : 77F

2014-06-23-02:00pm : 82F

0001:

{firstname: “Dipti”,

lastname: “Borkar”,

language: “English”,

time_zone: “PST”,

zip: 94403

}

Key - UTF-8 string up to 250 bytes

Value - can be 0 bytes – 20 MB (best practice < 1 MB)

Page 8: Couchbase 101

©2015 Couchbase Inc. 8

Fundamentals

Similar to primary keys in relational databases Documents are partitioned based on the document ID ID based document lookup is extremely fast Must be unique

JSON

Binary - integers, strings, booleans

Common binary values include serialized objects, compressed XML, compressed text, encrypted values

Document ID or Key

Value

CAS Value (unique identifier for concurrency)

TTL

Flags (optional client library metadata)

Revision #

Metadata

Page 9: Couchbase 101

©2015 Couchbase Inc. 9

Can Represent Complex Objects and Data Structures

Very simple notation, lightweight, compact, readable

The most common API return type for Integrations

Facebook, Twitter, you name it, return JSON

Native to Javascript (can be useful)

Can be inserted straight into Couchbase (faster development)

Serialization and Deserialization are very fast

Benefits of JSON

Page 10: Couchbase 101

©2015 Couchbase Inc. 10

Storing and retrieving documents

©2014 Couchbase, Inc.

Couchbase Cluster

Server Nodes

User/application data

Which live on

Data Buckets

DocumentsRead from / Written to

That form a

Clients

Servers

Dynamically scalable

Based on hash partitioning

Page 11: Couchbase 101

©2015 Couchbase Inc. 11

User Objectstring uid

string firstname

string lastname

int age

array favorite_colors

string email

u::[email protected]{ “uid”: 123456,

“firstname”: “John”,“lastname”: “Smith”,“age”: 22,“favorite_colors”: [“blue”, “black”],“email”: “[email protected]

}

User Objectstring uid

string firstname

string lastname

int age

array favorite_colors

string email

u::[email protected]{ “uid”: 123456,

“firstname”: “John”,“lastname”: “Smith”,“age”: 22,“favorite_colors”: [“blue”, “black”],“email”: “[email protected]

}

add()

get()

Objects Serialized to JSON and Back

©2014 Couchbase, Inc.

Page 12: Couchbase 101

©2015 Couchbase Inc. 12

Couchbase provides a complete Data Management solution

High availability cache

Key-value store

Document database

Embedded database

Sync management

Multi-purpose capabilities support a broad range of apps and use cases

Enterprises often start with cache, then broaden usage to other apps and use cases

Page 13: Couchbase 101

©2015 Couchbase Inc. 13

What makes Couchbase unique?

Performance & scalability leader

Sub millisecond latency with high throughput; memory-centric architecture

Multi-purpose

Simplified administration

Easy to deploy & manage; integrated Admin Console, single-click cluster expansion & rebalance

Cache, key value store, document database, and local/mobile database in single platform

Always-on availability

Data replication across nodes, clusters, and data centers

Enterprises choose Couchbase for several key advantages

24x365

Page 14: Couchbase 101

Operations

Page 15: Couchbase 101

©2015 Couchbase Inc. 15

Couchbase Server Architecture

QueryEngine

Object-managed

Cache

Storage Engine

DATA MANAGER

11210 / 11211Data access ports

8092Query API

HTTP

REST management API/Web UI

Replication, Rebalance, Shard State Manager

Erlang /OTP

CLUSTER MANAGER

8091Admin Console

Page 16: Couchbase 101

©2015 Couchbase Inc. 16

Single Node Operations - Write

33 2Managed Cache

Dis

k Q

ueu

e

Disk

Replication Queue

App Server

Memory-to-Memory Replication to other node

Doc

Doc Doc

Page 17: Couchbase 101

©2015 Couchbase Inc. 17

Managed Cache

Disk

Single Node Operations - Read

Managed Cache

Doc 1

Get Doc 1

Doc 1Doc 1

App Server

Dis

k Q

ueu

e

Replication Queue

Memory-to-Memory Replication to other node

Page 18: Couchbase 101

©2015 Couchbase Inc. 18

Disk

Managed Cache

Single Node Operations – Cache Ejection

Doc 1

Doc 1

Doc 2Doc 3Doc 4Doc 5Doc 6

Doc 2Doc 3Doc 4Doc 5Doc 6App Server

Dis

k Q

ueu

e

Replication Queue

Memory-to-Memory Replication to other node

Page 19: Couchbase 101

©2015 Couchbase Inc. 19

Single Node Operations – Cache Miss

33 2

Dis

k Q

ueu

e

Disk

Replication Queue

App Server

Memory-to-Memory Replication to other node

Doc 1

Doc 2Doc 3Doc 4Doc 5Doc 6

Doc 2Doc 3Doc 4Doc 5Doc 6

Doc 1

Doc 1Doc 1

Managed Cache

Get Doc 1

Page 20: Couchbase 101

Cluster-wide Operations

Page 21: Couchbase 101

©2015 Couchbase Inc. 21

Auto sharding – Bucket and vBuckets

Each bucket has active and replica data sets

Each data set has 1024 Virtual Bucket (vBuckets)

Documents get logically mapped to vBuckets

Document IDs always get hashed to the same virtual bucket

Virtual buckets to do not have a fixed physical server location

Mapping between the virtual buckets and physical server is called the cluster map

Each virtual bucket contains 1/1024th portion of the data set

vB

Data buckets

vB

1 ….. 1024

Virtual buckets

Page 22: Couchbase 101

©2015 Couchbase Inc. 22

Cluster Map

©2014 Couchbase, Inc.

Hash function (KEY)

vB1 vB2 vB3 vB4 vB5 vB1024

Ph

ysi

cal

serv

ers

A B C

Add node to scale out

Lo

gic

al

Pa

rtit

ion

s

Cluster Map

New Cluster Map

DocumentsRead from / Written to

Page 23: Couchbase 101

©2015 Couchbase Inc. 23

Cluster Map

Page 24: Couchbase 101

©2015 Couchbase Inc. 24

Cluster Map

Page 25: Couchbase 101

©2015 Couchbase Inc. 25

Cluster Map – 2 nodes added

Page 26: Couchbase 101

©2015 Couchbase Inc. 26

read/write/update

Active

SERVER 1

Active

SERVER 2

Active

SERVER 3

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

Shard

5

Shard

2

Shard

9

Shard

Shard

Shard

Shard

4

Shard

7

Shard

8

Shard

Shard

Shard

Shard

1

Shard

3

Shard

6

Shard

Shard

Shard

Replica Replica Replica

Shard

4

Shard

1

Shard

8

Shard

Shard

Shard

Shard

6

Shard

3

Shard

2

Shard

Shard

Shard

Shard

7

Shard

9

Shard

5

Shard

Shard

Shard

Multi-Node Operations

©2014 Couchbase, Inc. 26

• Docs distributed evenly across servers

• Each server stores both active and replica docs- Only one server active at a time

• Client library provides app with simple interface to database

• Cluster map provides map to which server doc is on- App never needs to know

• App reads, writes, updates docs

• Multiple app servers can access same document at same time

Page 27: Couchbase 101

©2015 Couchbase Inc. 27

SERVER 4 SERVER 5

Replica

Active

Replica

Active

read/write/update

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

Active

SERVER 1

Shard

9

Shard

Replica

Shard

4

Shard

1

Shard

8

Shard

Shard

Shard

Active

SERVER 2

Shard

8

Shard

Replica

Shard

6

Shard

3

Shard

2

Shard

Shard

Shard

Active

SERVER 3

Shard

6

Shard

Replica

Shard

7

Shard

9

Shard

5

Shard

Shard

Shard

read/write/update

Shard

5

Shard

2

Shard

Shard

Shard

4

Shard

7

Shard

Shard

Shard

1

Shard

3

Shard

Shard

Adding Nodes

©2014 Couchbase, Inc. 27

• Two servers added withone-click operation

• Docs automatically rebalance across cluster- Even distribution of docs- Minimum doc movement

• Cluster map updated

• App database calls now distributed over larger number of servers

Page 28: Couchbase 101

©2015 Couchbase Inc. 28

SERVER 4 SERVER 5

Replica

Active

Replica

ActiveActive

SERVER 1

Shard 5

Shard 2

Shard 9Shard

Shard

Shard

Replica

Shard 4

Shard 1

Shard 8Shard

Shard

Shard

Active

SERVER 2

Shard 4

Shard 7 Shard 8

Shard

Shard Shard

Replica

Shard 6

Shard 3 Shard 2

Shard

Shard Shard

Active

SERVER 3

Shard 1

Shard 3

Shard 6Shard

Shard

Shard

Replica

Shard 7

Shard 9

Shard 5Shard

Shard

Shard

• App servers accessing Shards

• Requests to Server 3 fail

• Cluster detects server failedo Promotes replicas of

Shards to activeo Updates cluster map

• Requests for docs now go to appropriate server

• Typically rebalance would follow

Shard 1 Shard 3

Shard

Managing failures

App Server 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

App Server 2

Page 29: Couchbase 101

A look at a live cluster

Page 30: Couchbase 101

Cross Data Center Replication

XDCR

Page 31: Couchbase 101

©2015 Couchbase Inc. 31

Market leading memory-to-memory replication

New York

San Francisco

Page 32: Couchbase 101

©2015 Couchbase Inc. 32

XDCR: Cross Data Center Replication

Application can access both clusters (master – master) Scales out linearly Different from intra-cluster replication (“CP” versus “AP”)

Page 33: Couchbase 101

©2015 Couchbase Inc. 35

XDCR: Flexible topologies

One-one, one-many, many-one Differently sized and resourced clusters supported

Page 34: Couchbase 101

©2015 Couchbase Inc. 36

33 2

XDCR after Write

Managed Cache

Dis

k Q

ueu

e

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1

Doc 1

XDCR Queue

Doc 1Doc 1

(New in 3.0) Memory-to-Memory Replication to remote cluster

Memory-to-Memory Replication to other node

Page 35: Couchbase 101

©2015 Couchbase Inc. 37

Indexing and Querying Features

©2014 Couchbase, Inc.

Index and Query Distributed indexing and querying Secondary indexes of JSON document content Flexible querying of indexes

Incremental Map-Reduce Distributed simple real-time analytics Only considers changes due to updated data

Full Text Search Robust integration with ElasticSearch / Solr cluster Flexible full text search and faceted search

Page 36: Couchbase 101

©2015 Couchbase Inc. 38

33 2

View processing after write

Managed Cache

Dis

k Q

ueu

e

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1

Doc 1

To other node

View engine Doc 1Doc 1

Page 37: Couchbase 101

©2015 Couchbase Inc. 39

Active

SERVER 1

Shard

5

Shard

2

Shard

Shard

Replica

Shard

4

Shard

1

Shard

Shard

Shard

1

Active

SERVER 3

Shard

5

Shard

2

Shard

Shard

Replica

Shard

4

Shard

1

Shard

Shard

Shard

1

Active

SERVER 2

Shard

5

Shard

2

Shard

Shard

Replica

Shard

4

Shard

1

Shard

Shard

Shard

1

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

Couchbase Server Architecture - Views

©2014 Couchbase, Inc.

• Indexing work is distributed amongst nodes

• Large data set possible

• Parallelize the effort

• Each node has index for data stored on it

• Queries combine the results from required nodes

Page 38: Couchbase 101

©2015 Couchbase Inc. 40

Couchbase Elastic Search Connector

Page 39: Couchbase 101

©2015 Couchbase Inc. 41

Couchbase Solr Connector

Page 40: Couchbase 101

N1QLWhy SQL for NoSQL?

Page 41: Couchbase 101

©2015 Couchbase Inc. 43

Why SQL for NoSQL

JSON document model provides Rich Structure (no assembly) Structure Evolution (flexible schema, seamless change)

SQL provides Query across relationships Query in general

Why SQL for JSON? To address all these data concerns N1QL is SQL for JSON

Page 42: Couchbase 101

©2015 Couchbase Inc. 44

Models for Representing Data

Data Concern Relational Model JSON Document Model (NoSQL)

Rich Structure

Multiple flat tables Constant assembly and

disassembly

Documents No assembly required!

Relationships Represented Queried (SQL)

Represented Queried? Not so far…

Value Evolution Data can be updated Data can be updated

Structure Evolution Uniform and rigid Change is disruptive and

manual

Flexible Change is seamless and data-

driven

Page 43: Couchbase 101

©2015 Couchbase Inc. 45

SELECT

Standard SELECT pipeline

SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT, OFFSET

Queries across relationships

JOINs

Subqueries

NEST — a JOIN that embeds child objects within their parent

UNNEST — a JOIN that surfaces nested objects as top-level data

Aggregation

Set operators

UNION, INTERSECT, EXCEPT

Page 44: Couchbase 101

©2015 Couchbase Inc. 46

N1QL Architecture

Single node installation, services defined dynamically

Query service access Index and Data to formulate response

All queries and direct access is topology aware and dynamically scalable

Page 45: Couchbase 101

©2015 Couchbase Inc. 47

Indexing

CREATE / DROP INDEX

Two types of indexes View indexes GSI indexes (global secondary indexes—new)

Can index any data expression Nested / complex expressions Computed expressions

EXPLAIN

Page 46: Couchbase 101

©2015 Couchbase Inc. 48

Data writes*

UPDATE … WHERE … Partial updates; deep updates

DELETE … WHERE … Deeply nested conditions

INSERT … VALUES …; INSERT … SELECT … Bulk insert; transfer and transformation

MERGE INSERT or UPDATE; ETL support

*Single-document atomicity.

Page 47: Couchbase 101

Q & AThank you.

[email protected]@dborkar