forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

65
A Next Generation Storage Engine for NoSQL Database Systems Chiyoung Seo Software Architect, Couchbase Inc. Chin Hong VP Product Management, Couchbase Inc.

Upload: mark-laptin

Post on 15-Aug-2015

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

A Next Generation Storage Engine for NoSQL Database Systems

Chiyoung SeoSoftware Architect, Couchbase Inc.

Chin HongVP Product Management, Couchbase Inc.

Page 2: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 2

Why a new KV storage engine? ForestDB Overview

Compact Index Structures WAL (Write-Ahead Logging) Optimizations for SSDs (Solid-State Drives)

Performance Evaluations LevelDB, RocksDB WiredTiger (B+Tree, LSM Tree)

Summary

Contents

2

Page 3: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

Why a new KV storage engine?

Page 4: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 4

Operate on huge volume of unstructured data

Significant amount of new data is constantly generated from hundreds of millions of users or devices

Still require high performance and scalability in managing their ever-growing database

Underlying storage engine is one of the most critical parts in database systems to provide high performance / scalability

Modern Web / Mobile Applications

4

Page 5: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 5

Main storage index structure in a database field: SQLite, Couchstore, WiredTiger

Generalization of binary search tree Each node consists of two or more {key, value (or pointer)} pairs

Fanout (or branch) degree: # of KV pairs in a node Node size is generally fitted into multiple page size

B+Tree

5

Page 6: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 6

Not suitable to index variable or fixed-length long keys Significant space overhead as entire key strings are indexed in non-leaf

nodes Tree depth grows quickly as more data is loaded In-place updates lead to database fragmentations I/O performance is degraded significantly as the data size gets

bigger and the database is fragmented Several variants of B+Tree were proposed. Most popular is LSM

Tree.

B+Tree Limitations

04/26…

…Key

Value (or Pointer)

longer keys

Page 7: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 7

03/26

LSM Tree (Log Structured Merge Tree) Main file organization for many products: HBase, Cassandra,

LevelDB, MongoDB (WiredTiger-LSM) Improve write performance by

Appending all updated and new data to a sequential log Deferring and batching index changes efficiently in sorted runs

In-memory

Sequential log

flush/merge merge

C1 tree C2 tree

merge

Capacity increases exponentially

Page 8: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 8

Not suitable to index variable or fixed-length long keys Significant space overhead as entire key strings are indexed in non-leaf

nodes

Tree depth grows quickly as more data is loaded. Merge operations between trees occur more frequently

Read is generally slower as the system may need to traverse multiple trees to find the record

LSM Limitations

04/26

In-memory

Sequential log

flush/merge merge

C1 tree C2 tree

merge

Capacity increases exponentially

Page 9: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 9

Fast and scalable index structure for variable or fixed-length long keys Targeting block I/O storage devices not only SSD but also

legacy HDD

Less storage space overhead Reduce write amplification

Efficient for different key patterns Keys with or without common prefixes

Efficient for mixed workloads

Goals for Next-Generation Storage Engine

06/26

Page 10: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

ForestDB

Page 11: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 11

Key-Value storage engine developed by Couchbase Caching / Storage team

Its main index structure is built from Hierarchical B+-Tree based Trie or HB+-Trie HB+-Trie was originally presented at ACM SIGMOD 2011 Programming

Contest, by Jung-Sang Ahn who works at Couchbase(http://db.csail.mit.edu/sigmod11contest/sigmod_2011_contest_poster_jungsang_ahn.pdf)

Significantly better read and write performance with less storage overhead

Support various server OSs (Centos, Ubuntu, Debian, Mac OS x, Windows) and mobile OSs (iOS, Android)

1.0 beta was released Oct, 2014

ForestDB

11

Page 12: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 12

Multi-Version Concurrency Control (MVCC) with append-only storage model

Write-Ahead Logging (WAL)

A value can be retrieved by its sequence number or disk offset in addition to a key

Custom compare function to support a customized key order

Snapshot support to provide different views of database

Rollback to revert the database to a specific point

Ranged iteration by keys or sequence numbers

Transactional support with read-committed or read-uncommitted isolation level

Manual or auto compaction configured per KV instance

Main Features

12

Page 13: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

ForestDB: Main Index Structure

Page 14: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 15

Trie (prefix tree) whose node is B+Tree A key is split into the list of fixed-size chunks (sub-string

of the key)

HB+Trie (Hierarchical B+Tree based Trie)

Variable length key: Fixed size (e.g. 4-byte)a83jgls83jgo29a…

07/26Lexicographical ordered traversal

Search using Chunk1

Document

B+Tree (Node of HB+Trie)

Node of B+Tree

Chunk1Chunk2Chunk3 …

a83j gls8 3jgo …

Search using Chunk2

Search using Chunk3

07/26

Page 15: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 16

Prefix Compression

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node)

Example: Chunk size = 1 byte

1stInsert ‘aaaa’

B+Tree using 1st

chunk as key

Page 16: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 17

Prefix Compression

1stInsert ‘aaaa’

aaaaa

Distinguishable by first chunk ‘a’

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node)

Example: Chunk size = 1 byteB+Tree using

1st chunk as key

Page 17: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 18

Prefix Compression

Distinguishable by

first chunk ‘b’

B+Tree using 1st

chunk as key

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node)

Example: Chunk size = 1 byte

Insert ‘bbbb’

aaaa

1st

abbbb

b

Page 18: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 19

Prefix Compression

B+Tree using 1st

chunk as key

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node) Example: Chunk size = 1 byte

Insert ‘aaab’

aaaa

1st

abbbb

bCannot

distinguish using first chunk

‘a’

Page 19: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 20

Prefix Compression

Insert ‘aaab’

aaaaCannot distinguish

using first chunk ‘a’ First

distinguishable chunk: 4th

B+Tree using 1st

chunk as key

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node) Example: Chunk size = 1 byte

1st

abbbb

b

Page 20: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 21

Prefix Compression

Store skipped common prefix

‘aa’

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node) Example: Chunk size = 1 byte

1st

abbbb

b

4th aa

aaaaa

aaabb

B+Tree using 4th chunk as key,

skipping common prefix ‘aa’

Page 21: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 22

Prefix Compression

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node) Example: Chunk size = 1 byte

1st

abbbb

b

4th aa

aaaaa

aaabb

Insert ‘bbcd’ Cannot distinguish

using first chunk ‘b’

B+Tree using 4th chunk as key,

skipping common prefix ‘aa’

Page 22: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 23

Prefix Compression

08/26

As original trie, each node (B+Tree) is created on-demand (except for root node) Example: Chunk size = 1 byte

1st

abbbb

b

4th aa

aaaaa

aaabb

Insert ‘bbcd’ Cannot distinguish

using first chunk ‘b’

B+Tree using 4th chunk as key,

skipping common prefix ‘aa’

First distinguishable

chunk: 3rd

Page 23: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 24

As original trie, each node (B+Tree) is created on-demand (except for root node) Example: Chunk size = 1 byte

Prefix Compression

1st

a b

4th

aa

aaaaa

aaabb

3rd b

bbbb bbcdb c

Store skipped common prefix

‘b’

B+Tree using 3rd chunk as key,

skipping common prefix ‘b’

08/26

Page 24: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 25

Compact Index Structure

When keys have common prefixes (e.g., secondary index keys)

Page 25: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 26

Compact Index Structure

09/26

1st

Insert a83jfl2iejzm302k,dpwk3gjrieorigje,z9382h3igor8eh4k,283hgoeir8goerha,023o8f9o8zufisue

a83jfl2iejzm30

2k

a8dpwk3gjrieorig

je

dpz9382h3igor8eh

4k

z9283hgoeir8goer

ha

28023o8f9o8zufis

ue

02

Majority of keys can be indexed by first chunk There will be only one B+Tree on HB+Trie

We don’t need to store & compare entire key string

When keys have common prefixes (e.g., secondary index keys) When keys are sufficiently long & uniform random (e.g., UUID or

hash value) Example: Chunk size = 2 bytes

Page 26: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 28

Suppose that Node size: 4 KB / key length: 64 bytes / pointer (or value)

size: 8 bytes Indexing 1 billion keys

Compaction overhead can be reduced significantly Buffer cache can accommodate more pages and manage

them more efficiently

Compact Index Structure - Benefits

14.1 times smaller

10/26

Original B+Tree HB+Trie (4-byte chunk)

Fanout 4096 / (64+8) ~= 56 4096 / (4+8) ~= 341

Height log56(10003) ~= 6 log341(10003) ~= 4

Space needed for the index

4KB * ~= 2139.07 GB 4KB * ~= 151.70 GB

Page 27: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

ForestDB: Write-Ahead Logging

Page 28: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 39

ForestDB maintains two index structures HB+Trie: key index Sequence B+Tree: sequence number (8-byte integer)

index Retrieve the file offset to a value using key or sequence

number

ForestDB Index Structures

DB file Doc Doc Doc Doc Doc Doc …

HB+Trie

B+Tree

key

Sequence number

11/26

Page 29: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 40

Append updates first, and update the main indexes later Main purposes

To maximize write throughput by sequential writes (append-only updates)

To reduce # of index nodes to be written by batched updates

Write-Ahead Logging

DB file Docs Index nodes

ID index Seq no. index

WAL indexes:in-memory structures(hash table)

H

DB header (1 block)

HB+Trie nodes

Page 30: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 41

Document updatesDocs

Append updates first, and update the main indexes later Main purposes

To maximize write throughput by sequential writes (append-only updates)

To reduce # of index nodes to be written by batched updates

Write-Ahead Logging

DB file Docs Index nodes

ID index Seq no. index

WAL indexes:in-memory structures(hash table)

H

DB header (1 block)

HB+Trie nodes

Page 31: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 42

15/26

Append documents

Append updates first, and update the main indexes later Main purposes

To maximize write throughput by sequential writes (append-only updates)

To reduce # of index nodes to be written by batched updates

Write-Ahead Logging

DB file Docs Index nodes

ID index Seq no. index

WAL indexes:in-memory structures(hash table)

H Docs

Page 32: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 43

15/26

Append updates first, and update the main indexes later Main purposes

To maximize write throughput by sequential writes (append-only updates)

To reduce # of index nodes to be written by batched updates

Write-Ahead Logging

Update WAL indexes

DocsDB file Docs Index nodes

h(key)h(key)

OffsetOffset

h(seq no)h(seq no)…

OffsetOffset

ID index Seq no. index

WAL indexes:in-memory structures(hash table)

H

Page 33: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 44

Append updates first, and update the main indexes later Main purposes

To maximize write throughput by sequential writes (append-only updates)

To reduce # of index nodes to be written by batched updates

Write-Ahead Logging

Append DB headerfor every commitHDocsDB file Docs Index nodes

h(key)h(key)

OffsetOffset

h(seq no)h(seq no)…

OffsetOffset

ID index Seq no. index

WAL indexes:in-memory structures(hash table)

H

Page 34: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 45

Append DB headerfor every commitHDocsDB file Docs Index nodes

h(key)h(key)

OffsetOffset

h(seq no)h(seq no)…

OffsetOffset

ID index Seq no. index

WAL indexes:in-memory structures(hash table)

H15/26

Append updates first, and update the main indexes later Main purposes

To maximize write throughput by sequential writes (append-only updates)

To reduce # of index nodes to be written by batched updates

Write-Ahead Logging

< Key query>1. Retrieve WAL index first2. If hit return immediately3. If miss retrieve HB+Trie (or

B+Tree)

Page 35: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

Optimizations for Solid-State Drives

Page 36: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 60

OS file system stack overhead Metadata update Page cache shared among processes

Compaction overhead Need to read an entire database file and write all valid

pages into a new file Use too much disk I/O bandwidth

Lack of utilizing parallel channels inside SSD Fetching multiple blocks at the same time, which are

stored in different channels Using async I/O library (e.g., libaio)

Current Limitations

26/26

Page 37: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 61

26/26

OS File System Stack Overhead

SSD SSD SSD

Block I/O Interface (SATA, PCI)

OS File System

Page Cache

Meta Data Mgmt

Database Storage Engine

SSD SSD SSD

Block I/O Interface (SATA, PCI)

Database Storage Engine

… Buffer Cache

Typical Database Storage Stack

Advanced Database Storage Stack

Volume Manager

Page 38: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 63

Required for append-only storage model Garbage collect stale data blocks

Use significant disk I/O bandwidth Read the entire database file and write all valid blocks

into a new file

Affect other performance metrics Regular read / write performance drops significantly

Database Compaction

63

Page 39: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 64

Logical page can change its physical address in flash memory whenever it is overwritten

For this reason, the mapping table between LBA and PBA is maintained by Flash Translation Layer (FTL)

SWAT-Based Compaction Optimization

64

A B C D E F…

Logical Address in File System (LBA)

FTL Address Mapping: LBA PBAPhysical Address inFlash Memory (PBA)

A A’…

Page 40: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 65

SWAT-Based Compaction Optimization

65

Document

B+Tree (Node of HB+Trie)

B+Tree Node

Old Ver. of B+Tree Node

I

G H

E

A B

F

C D C’

F’

H’

I’

G

E

A B DC’

F’

H’

I’

Current DB file

New CompactedDB file

A new compacted file can be simply

created by creating the new LBA to PBA mappings that contain the valid pages only in the current DB file

Need to extend the FTL by adding a

new interface SWAT (Swap and Trim)

Page 41: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 66

Implement SWAT interface on the OpenSSD development platform by adapting its FTL code

Total time taken for compactions was reduced by 17x

Number of compactions triggered was reduced by 4x

SWAT-Based Compaction Optimization

66

Page 42: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 67

Exploit async I/O library (e.g., libaio) to better utilize the parallel I/O capabilities by SSDs

Quite useful in querying secondary indexes when items satisfying a query predicate are located in multiple blocks on different channels

Utilizing Parallel Channels on SSDs

67

Page 43: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

ForestDB: EvaluationForestDB, LevelDB, RocksDB

Page 44: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 71

Evaluation Environments 64-bit machine running Centos 6.5 Intel Xeon 2.00 GHz CPU (6 cores, 12 threads) 32GB RAM and Crucial M4 SSD

Data Key size 32 bytes and value size 1KB Load 100M items Logical data size 100GB total

ForestDB Evaluation – LevelDB, RocksDB

Page 45: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 72

LevelDB Compression is disabled Write buffer size: 256 MB (initial load), 4 MB (otherwise) Buffer cache size: 8 GB

RocksDB Compression is disabled Write buffer size: 256 MB (initial load), 4 MB (otherwise) Maximum number of background compaction threads: 8 Maximum number of background memtable flushes: 8 Maximum number of write buffers: 8 Buffer cache size: 8 GB (uncompressed)

ForestDB Compression is disabled WAL size: 4,096 documents Buffer cache size: 8 GB

KV Storage Engine Configurations

Page 46: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 73

Initial Load Performance

3x ~ 6x less time

Page 47: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 75

Read-Only Performance

1 2 4 80

5000

10000

15000

20000

25000

30000

Read-Only Performance

ForestDB LevelDB RocksDB

# reader threads

Opera

tions

per

seco

nd

2x ~ 5x

Page 48: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 76

Write-Only Performance

1 4 16 64 2560

2000

4000

6000

8000

10000

12000

Write-Only Performance

ForestDB LevelDB RocksDB

Write batch size (# documents)

Ope

ratio

ns p

er s

econ

d

- Small batch size (e.g., < 10) is not usually common

3x ~ 5x

Page 49: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 77

Write-Only Performance

1 4 16 64 2560

50

100

150

200

250

300

350

400

450

Write Amplification

ForestDB LevelDB RocksDB

Write batch size (# documents)

Wri

te a

mplifica

tion

(Norm

alize

d t

o a

sin

gle

doc

size

)

ForestDB shows 4x ~ 20x less write amplification

Page 50: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 79

Mixed Workload Performance

1 2 4 80

2000

4000

6000

8000

10000

12000

Mixed (Unrestricted) Performance

ForestDB LevelDB RocksDB

# reader threads

Ope

ratio

ns p

er s

econ

d

2x ~ 5x

Page 51: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

ForestDB: EvaluationForestDB, WiredTiger (B+Tee, LSM Tree)

Page 52: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 81

Evaluation Environment CPU: Intel Core i7-3770 CPU @ 3.40 GHz ( 8 virtual cores) RAM: 32 GB (DDR3, 1600 MHz) OS: Ubuntu 12.04.5 LTS (Linux version 3.8.0-29-generic) Disk: Samsung SSD 840 EVO (formatted with Ext4) Benchmark: ForestDB-Benchmark WiredTiger version: 2.5.0

Page 53: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 83

Initial Load: Insertions / Sec

0 1000 2000 3000 4000 5000 6000 7000 80000

20000

40000

60000

80000

100000

120000

140000

160000

180000

Bulk Load

FDB WT LSM FDB (avg) WT LSM (avg)

Elapsed time (second)In

sert

ion

s p

er

secon

d

2.5x faster

Note: Excluded WiredTiger B+ Tree due to slow speed

Key size: 32 bytes on average

Document size: 128 bytes on average

# of documents: 200,000,000 (more than 40GB DB size)

Cache size: 16GB Asynchronous writes

Page 54: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 85

Read-Only Performance (small document + no DGM)

Key size: 32 bytes on average Document size: 128 bytes on average # of documents: 10,000,000 (1.2GB DB size) Cache size: 16GB

1 2 4 8 160

50000010000001500000200000025000003000000

Read-Only Throughput (Small)

ForestDB WT B-tree WT LSM

# reader threads

Opera

tions

per

seco

nd

1.5x – 3x slower

Page 55: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 89

Read-Only Performance (large document + DGM + long key)

Key size: 256 bytes, 1024 bytes on average Document size: 1 KB on average # of documents: 10,000,000 (11GB DB size) RAM size: 2GB Cache size: 512 MB

Page 56: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 90

Read-Only Performance (large document + DGM + long key)

1 2 4 8 16

ForestDB 9382.94 14291.1 20673.55 27054.92 29730.66

WT B-tree 4132.66 7778.51 13924.61 22880.54 29805.06

WT LSM 3160.17 4351.8 7452.94 12130.47 15675.62

25007500

1250017500225002750032500

Read-Only Throughput (Key: 256 bytes)

ForestDB WT B-tree WT LSM

# reader threads

Opera

tions

per

seco

nd 2x - 3x faster Note that disk is fully

utilized with 16 threads

Page 57: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 91

Read-Only Performance (large document + DGM + long key)

1 2 4 8 16

ForestDB 5626.42 9604.27 14774.53 19178.67 19858.83

WT B-tree 190 391.05 607.21 698.01 729.8

WT LSM 589.4 793.14 851.37 857.18 858.91

25007500

125001750022500

Read-Only Throughput (Key: 1024 bytes)

ForestDB WT B-tree WT LSM

# reader threads

Opera

tions

per

seco

nd 24x - 27x faster

Page 58: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 92

Write-Only Performance

Key size: 48 bytes on average Document size: 1 KB on average # of documents: 10,000,000 (11GB DB size) RAM size: 2GB Cache size: 512 MB

Page 59: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 93

Write-Only Throughput (Synchronous)

1 4 16 64 2560

4000

8000

12000

16000

20000

Write-Only Throughput (Synchronous)

ForestDB WT B-tree WT LSM

Batch size per commit

Opera

tions

per

seco

nd 3x – 6x faster

Page 60: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 94

Write-Only Amplification (Synchronous)

1 4 16 64 2561

10

100

1000

12.16.4 4.8 4.4 4.4

13.1 11.9 11.7 11.7 11.7

126.6 124.262.8

31.5 32.4

Write Amplification (Synchronous)

ForestDB WT B-tree WT LSM

Batch size per commit

Wri

te A

mplifica

tion

3x - 20x less amplification

Page 61: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 97

Mixed Workload Performance

Key size: 48 bytes on average Document size: 1 KB on average # of documents: 10,000,000 (11GB DB size) RAM size: 2GB Cache size: 512 MB Single writer thread and multiple reader threads Writer batch size: 16 documents on average

Page 62: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2015 Couchbase Inc. 98

Mixed Workload (Read: 80%, Write = 20%)

1 2 4 8 160

5000

10000

15000

20000

25000

Mixed Workloads (R:W = 8:2)

ForestDB WT B-tree WT LSM

# reader threads

Opera

tions

per

seco

nd 2x - 8x faster

Page 63: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

Summary

Page 64: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 102

Compact and efficient storage for variety of data – HB+Trie In-memory WAL indexes to improve write/read

performance Optimized for new SSD storage technology

Bypassing OS file system, SWAT-based compaction, Parallel IO channels

Unified storage engine that performs well for various workloads

Unified storage engine that scales from small devices to large servers Couchbase Server secondary index Couchbase Lite Couchbase Server KV engine

ForestDB - Summary

102

Page 65: forestdb-nextgenerationstorageengine-150413140327-conversion-gate01

©2014 Couchbase, Inc. ©2015 Couchbase Inc. 103103

Questions?

[email protected]@couchbase.com