boltdb - an embedded key value database

15
Manoj Awasthi, Tech Architect @Tokopedia Boltdb an embedded key value database

Upload: manoj-awasthi

Post on 10-Jan-2017

64 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Boltdb - an embedded key value database

Manoj Awasthi, Tech Architect @Tokopedia

Boltdban embedded key value database

Page 2: Boltdb - an embedded key value database

Structure of this talk..

Page 3: Boltdb - an embedded key value database

A bit of history..

Image Server 1

Image Server 2

Image Server N

...

123.jpg : server_03; 246.jpg : server_02; 345.jpg : server_17;

…. ….

tokopedia image router

.. as time passed

Page 4: Boltdb - an embedded key value database

gradually, we kept newer images to s3:// ..

• All images uploaded from that point onwards could be served from a single server

• no such mapping (mongodb) was required,

• Old images still being served in the same way and did need the mapping. • But, now the database was “read only” and fixed size.

Also: We suffered frequent memory spikes and process kill by linux “out of memory killer” (mongodb) which led both to latency and downtimes.

Page 5: Boltdb - an embedded key value database

Search for alternative..

Requirements boiled down to:

• Fast retrieval - needed all across • Scalable - to tens of thousands of queries per second • Persistent - don’t have to recompute everything from scratch on each bootup

(in case!)

Read only usage - not a constraint but this could help in “trade off”

Also, we can do with

Page 6: Boltdb - an embedded key value database

Redis! Well, it could work well given our fixed data size and read only usage.

In fact, we did try and saw scale problems with redis (high cpu load).

Also $$.

We needed a lightweight embedded database .. “BoltDB” - an embedded key value

database written in golang looked interesting.

Why not redis?

Page 7: Boltdb - an embedded key value database

Compact, fast.

Based on LMDB [0].

Both use B+ tree for storage, maintain ACID semantics with fully serializable transactions, and support many other database features.

Simple

While LMDB focuses on raw performance, Boltdb is focussed on ease of use.

Fits better for a “read heavy” usage (read more, write less)

Written in golang so fits well with rest of the stack at Tokopedia.

[0] https://symas.com/products/lightning-memory-mapped-database/

Why boltdb?

Page 8: Boltdb - an embedded key value database

Why boltdb?

In traditional sense, boltdb is not really a database but simply a memory mapped file. But it provides ACID semantics and other properties associated with databases so calling it a DB is not misnomer, though.

No installation required ● It comes as a library ● Installation is as simple as

importing it in your go program

Page 9: Boltdb - an embedded key value database

Opening the database..

Add a key value

Fetch a value by key

Page 10: Boltdb - an embedded key value database

bolt - command line utility

Bolt is a tool for inspecting bolt databases

Things to use it for:

Check the integrity of bolt database

Run synthetic benchmarks against bolt database for gauging read and write performance

Print basic info about database

Generate useful statistics on all pages in the database

Available under cmd/bolt in the github repository.

Page 11: Boltdb - an embedded key value database

Caveat: random writes slow as the db grows!

Let’s get back to the problem we were solving.

The raw data from mongodb exported using mongo-export utility was ~ 4G.

This translated to ~ 13G boltdb database file.

Export tool that we wrote to export from mongo output to boltdb became much slower as the size of the database grew. Hence we used sharding to horizontally partition the data from mongo into many small files and have a smaller boltdb file for each of them.

Page 12: Boltdb - an embedded key value database

The result!

Following is the output of `free -m’ on one of the servers we use:

Snippet of `top’ output from the same server:

Page 13: Boltdb - an embedded key value database

Limitations

Bolt is good for read intensive workloads. Random writes can be slow.

Bolt uses B+ tree internally so there can be a lot of random page access. SSDs provide a significant performance boost over spinning disks.

Bolt can handle databases much larger than available physical RAM, provided its memory map fits in process address space. It may be problematic on 32 bit systems.

The data structures used by bolt are memory mapped and hence endian specific. This means that you cannot copy a bolt file from a little endian machine to a big endian machine and have it work. (Most modern CPUs are little endian).

Page 14: Boltdb - an embedded key value database

Conclusion

Boltdb worked pretty well for our usecase.

Service handles many thousands of queries per second, is not limited by physical RAM and doing well! :D

Do give it a try if it fits some of your use case.

References:

[1] https://github.com/boltdb/bolt[2] http://tech.tokopedia.com/blog/using-boltdb-as-a-fast-persistent-kv-store/[3] https://symas.com/products/lightning-memory-mapped-database/

Page 15: Boltdb - an embedded key value database

Connect with me over:

{ “Email”: “[email protected]”, “Twitter”: “https://twitter.com/awmanoj”, “Linkedin”: “https://www.linkedin.com/in/manojawasthi”, “Github”: “https://github.com/awmanoj/”, “Blog”: [ “http://awmanoj.github.io/”, “http://www.manojawasthi.com”]}

Thank you!