couchbase server in production boston

46
1 Couchbase Server in Production Matt Ingenthron Director, Developer Solutions

Upload: couchbase

Post on 18-Jul-2015

537 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Couchbase server in production   boston

1

Couchbase Server in Production

Matt IngenthronDirector, Developer Solutions

Page 2: Couchbase server in production   boston

2

Typical Couchbase production environment

Application users

Load Balancer

Application Servers

Servers

Page 3: Couchbase server in production   boston

3

We’ll focus on App-Couchbase interaction …

Application users

Load Balancer

Application Servers

Servers

Page 4: Couchbase server in production   boston

4

… at each step of the application lifecycle

Dev/Test Size Deploy Monitor Manage

Page 5: Couchbase server in production   boston

5

KEY CONCEPTS

Page 6: Couchbase server in production   boston

6

Reading, Writing and Arithmetic

Reading Data Writing Data

Server

Give medocument A

Here is document A

Application Server

A

Server

Please storedocument A

OK, I storeddocument A

Application Server

A

(We’ll save the arithmetic for the sizing section : )

Page 7: Couchbase server in production   boston

7

Server

Reading data

RAM

DISK

Application Server

Give me document A

A

Here is document A

If document A is in memoryreturn document A to the application

Elseadd document to read queuereader eventually loads document

from disk into memoryreturn document A to the application

A

Reading Data

Page 8: Couchbase server in production   boston

8

Keeping working data set in RAM is key to read performance

Your application’s working set should fit in RAM…

… or else! (because you don’t want the “else” part happening very often – it is MUCH slower than a memory read and you could have to

wait in line an indeterminate amount of time for the read to happen.)

Reading Data

Page 9: Couchbase server in production   boston

9

Working set ratio depends on your application

Server Server Server

Late stage social gameMany users no longer

active; few logged in at any given time.

Ad NetworkAny cookie can show up

at any time.

Business applicationUsers logged in during

the day. Day moves around the globe.

working/total set = 1working/total set = .01 working/total set = .33

Reading Data

Page 10: Couchbase server in production   boston

10

Server

Couchbase in operation: Writing data

RAM

DISK

Application Server

Store document A

A

OK, it is stored

If there is room for the document in RAMStore the document in RAM

ElseEject other document(s) from RAMStore the document in RAM

Add the document to the replication queueReplicator eventually transmits document

Add the document to write queueWriter eventually writes document to disk

A

Writing Data

Page 11: Couchbase server in production   boston

11

Server

Flow of data when writing

Writing Data

Application ServerApplication Server Application Server

Applications writing to Couchbase

Couchbase writing to disk

network

Couchbase transmitting replicas

Page 12: Couchbase server in production   boston

12

Server

Queues build if aggregate arrival rate exceeds drain rates

Writing Data

Application ServerApplication Server Application Server

network

Replication queue Disk write queue

Page 13: Couchbase server in production   boston

13

ServerServer Server

Scaling out permits matching of aggregate flow rates so queues do not grow

Application ServerApplication Server Application Server

network networknetwork

Page 14: Couchbase server in production   boston

14

DEVELOPMENT

Dev-Test Size Deploy Monitor Manage

Page 15: Couchbase server in production   boston

15

Couchbase SDKs

Java SDK

.Net SDK

PHP SDK

Ruby SDK

…and many more

Java client API

User Code

Couchbase Server

CouchbaseClient cb = new CouchbaseClient(listURIs,"aBucket", "letmein");// this is all the same as beforecb.set("hello", 0, "world");cb.get("hello");

http://www.couchbase.com/develop

Couchbase Java Library (spymemcached)

Page 16: Couchbase server in production   boston

16

Couchbase Data

• Couchbase uses (and is completely compatible with) the memcached protocol.

• While you can use any standard memcached library, Couchbase also provides it’s own libraries for a variety of languages.

• Couchbase is document-oriented

• See http://www.couchbase.com/develop

Page 17: Couchbase server in production   boston

17

Farm Town Wars App Code

Ap

plic

atio

n s

erv

er

Co

uch

ba

se

Se

rve

r

Couchbase JavaClient library

Couchbase Server

11210

(“smart”) library

Farm Town Wars App Code

Ap

plic

atio

n s

erv

er

Co

uch

ba

se

Se

rve

r

Memcached Client

Moxi (Couchbase proxy)

11210

Client-side Moxi

OR8091

8091

Couchbase Client Deployment

Couchbase Server

Page 18: Couchbase server in production   boston

18

SERVER AND CLUSTER SIZING(TIME FOR THE ARITHMETIC)

Dev-Test Size Deploy Monitor Manage

Page 19: Couchbase server in production   boston

19

Size Couchbase Server

Sizing == performance• Serve reads out of RAM• Enough IO for writes• Mitigate inevitable failures

Reading Data Writing Data

Server

Give medocument A

Here is document A

Application Server

A

Server

Please storedocument A

OK, I storeddocument A

Application Server

A

Page 20: Couchbase server in production   boston

20

How many nodes?

4 Key Factors determine number of nodes needed:

1) RAM2) Disk3) Network4) Data Distribution/Safety

Couchbase Servers

Web application server

Application user

Page 21: Couchbase server in production   boston

21

RAM sizing

1) RAM• Working set• Metadata• Buffer/overhead• Active+Replica(s)

Keep working set in RAM for best read performance

Server

Give medocument A

Here is document A

Application Server

A

A

A

Reading Data

Page 22: Couchbase server in production   boston

23

Disk sizing: Space and I/O

2) Disk• Sustained write rate• Rebalance capacity• Backups • Total dataset• Active+Replicas

I/O

Space

Please storedocument A

OK, I storeddocument A

Application Server

A

Server

A

A

Writing Data

Page 23: Couchbase server in production   boston

24

Network sizing

3) Network• Client traffic• Replication (writes)• Rebalancing

Reads+Writes

Replication (multiply writes) and Rebalancing

Page 24: Couchbase server in production   boston

25

Data Distribution

4) Data Distribution / Safety (assuming one replica):• 1 node = BAD• 2 nodes = …better…• 3+ nodes = BEST!

Note: Many applications will need more than 3 nodes

Servers fail, be prepared. The more nodes, the less impact a failure will have.

Page 25: Couchbase server in production   boston

26

COUCHBASE CLIENT LIBRARY

Data Distribution

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Read/Write/Update

COUCHBASE CLIENT LIBRARY

Read/Write/Update

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

CLUSTER MAP CLUSTER MAP

APP SERVER 1 APP SERVER 2

COUCHBASE SERVER CLUSTER

Page 26: Couchbase server in production   boston

27

How many nodes? (recap)

4 Key Factors determine number of nodes needed:

1) RAM2) Disk3) Network4) Data Distribution

Couchbase Servers

Web application server

Application user

Page 27: Couchbase server in production   boston

28

MONITORING

Dev-Test Size Deploy Monitor Manage

Page 28: Couchbase server in production   boston

29Server

Key resources: RAM, Disk, Network

RAM

DISK

NETWORK

Server

RAM

DISK

Server

RAM

DISK

Application Server Application Server Application Server

Page 29: Couchbase server in production   boston

30

Monitoring

Once in production, heart of operations is monitoring

-RAM Usage-Disk writes queues / read activity-Network bandwidth, replication queues-Data distribution (balance, replicas)

Page 30: Couchbase server in production   boston

31

How do you know when your working set is not in RAM?

Server

RAM

DISK

Application Server

Give me document A

A

Here is document A

If document A is in memoryreturn document A to the application

Elseadd document to read queuereader eventually loads document

from disk into memoryreturn document A to the application

A

Cache Miss Ratio

Page 31: Couchbase server in production   boston

32

How do you know when you don’t have enough disk I/O?

Disk Write Queue

Page 32: Couchbase server in production   boston

33

How do you know when you don’t have enough network I/O?

TAP Replication Queue

Page 33: Couchbase server in production   boston

35

Page 34: Couchbase server in production   boston

36

MANAGEMENT AND MAINTENANCE

Dev-Test Size Deploy Monitor Manage

Page 35: Couchbase server in production   boston

37

Growth

Going from 5 million to 100 million users…

– RAM usage is growing:• Cache misses increasing

• Resident item ratios decreasing

• Disk fetches increasing

– Disk write queue growing higher than usual

Need to add a few more nodes...

…More RAM, disk and network without any downtime

Page 36: Couchbase server in production   boston

38

Add Nodes

Read/Write/Update Read/Write/Update

Doc 7

Doc 9

Doc 3

Active Docs

Replica Docs

Doc 6

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 1

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 2

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

SERVER 4 SERVER 5

Active Docs Active Docs

Replica Docs Replica Docs

COUCHBASE SERVER CLUSTER

Page 37: Couchbase server in production   boston

39

As simple as running a packaged script (cbbackup)

Done on live system with minimal to no performance impact

Backup

Page 38: Couchbase server in production   boston

40

1) Replace backup files, server will automatically “warmup” from disk files upon restart

– Traditional RDBMS performance is acceptable while slowly populating cache

– Our applications demand a different level of performance

– Couchbase Server pre-loads as much as possible into RAM

Restore

warmup

Page 39: Couchbase server in production   boston

41

Restore

2) “cbrestore” used to restore data into live/different cluster

Data Files

cbrestore

Page 40: Couchbase server in production   boston

42

1. Add nodes of new version, rebalance…

2. Remove nodes of old version, rebalance…

3. Done!

No disruption

General use for software upgrade, hardware refresh, planned maintenance

Upgrade existing Membase 1.7 to Couchbase Server 1.8

Upgrade

Page 41: Couchbase server in production   boston

43

Current use of sqlite causes performance degradation as DB files get fragmented

-“vacuum” available (but not as online operation)

- Best practice: Repeat rebalance to “clean” disk files

Under Development: “Maintenance mode” to allow for safely offlining of node to perform vacuuming in place.

Couchbase Server 2.0 has much improved behavior

Disk fragmentation

Page 42: Couchbase server in production   boston

44

Failures Happen!

Hardware

NetworkBugs

Page 43: Couchbase server in production   boston

45

Easy to Manage failures with Couchbase

• Failover (automatic or manual):

– Replica data promoted for immediate access

– Replicas not recreated

– Do NOT failover healthy node

Page 44: Couchbase server in production   boston

46

Fail Over

Doc 7

Doc 9

Doc 3

Active Docs

Replica Docs

Doc 6

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 1

COUCHBASE CLIENT LIBRARY

CLUSTER MAP

APP SERVER 2

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7 Doc 8

Doc 6

Doc 3

DOC

DOC

DOCDOC

DOC

DOC

DOC DOC

DOC

DOC

DOC DOC

DOC

DOC

DOC

Doc 9

Doc 5DOC

DOC

DOC

Doc 1

Doc 8

Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

SERVER 4 SERVER 5

Active Docs Active Docs

Replica Docs Replica Docs

COUCHBASE SERVER CLUSTER

Page 45: Couchbase server in production   boston

47

Easy to maintain Couchbase

• Use remove+rebalance on “malfunctioning” node:

– Protects data distribution and “safety”

– Replicas recreated

– Best to “swap” with new node to maintain capacity

Page 46: Couchbase server in production   boston

48

QUESTIONS?