5 pitfalls to avoid with mongodb
DESCRIPTION
Learn how 5 of the most common MongoDB pitfalls can be avoided with Tokutek's TokuMX.TRANSCRIPT
Tokutek: Database Performance Engines
What is Tokutek?Tokutek® offers high performance and scalability for MySQL, MariaDB and MongoDB. Our easy-to-use open source solutions are compatible with your existing code and application infrastructure.
Tokutek Performance Engines Remove Limitations-Improve insertion performance by 20X-Reduce HDD and flash storage requirements up to 90%-No need to rewrite code
Tokutek Mission: Empower your database to handle the Big Data
requirements of today’s applications
3
A Global Customer Base
Housekeeping
• This presentation will be available for replay following the event
• We welcome your questions; please use the console on the right of your screen and we will answer following the presentation
• A copy of the presentation is available upon request
Agenda
• Describe use-cases that lead to well known pitfalls
• How can they be avoided?
• Test, Measure, and Analyze (benchmark)
6
Pitfalls - 1982
Pitfalls - 2013
What is TokuMX?
• TokuMX = MongoDB with improved storage
• Drop in replacement for MongoDB v2.4 applications• Including replication and sharding• Same data model• Same query language• Drivers just work• No Full Text or Geospatial
• Open Source– http://github.com/Tokutek/mongo
9
Pitfall 1 : Space
1a : Space
• MongoDB databases often grow quite large• it easily allows users to...• store large documents• keep them around for a long time
• de-normalized data needs more space• Operational challenges• Big disks are cheap, but not fast• Cloud storage is even slower• Fast disks (flash) are expensive• Backups are large as well
• Unfortunately, MongoDB does not offer compression
• goal = use less disk/flash
1a : Space : Avoidance
• TokuMX offers built-in compression• 3 compression algorithms• quicklz, zlib, lzma, (none)
• Everything is compressed• Field names and values• Secondary indexes too
• BitTorrent Peer Snapshot Data (~31 million documents)• 3 Indexes : peer_id + created, torrent_snapshot_id + created, created
{ id: 1, peer_id: 9222, torrent_snapshot_id: 4, upload_speed: 0.0000, download_speed: 0.0000, payload_upload_speed: 0.0000, payload_download_speed: 0.0000, total_upload: 0, total_download: 0, fail_count: 0, hashfail_count: 0, progress: 0.0000, created: "2008-10-28 01:57:35" }
http://cs.brown.edu/~pavlo/torrent/
12
1a : Space : Test
13
1a : Space : Analyze
size on disk, ~31 million inserts (lower is better)
14
1a : Space : Analyze
size on disk, ~31 million inserts (lower is better)
TokuMX achieved11.6:1
compression
1b : Space
• MongoDB stores field names in each document• Lots of redundant data• When field names are long, documents may contain more field name data than actual values
• Google “mongodb long field names”• Lots of blogs and advice
• ... but descriptive schemas are useful!
1b : Space : Avoidance
• Again, TokuMX offers built-in compression• Field names are compressed along with values
• Compression algorithms love redundant data
• Be descriptive and toss that data dictionary!• Who knows what is in field “zq”, not me?
1b : Space : Test
schema 1 - long field names (10/20/20){ first_name : “Tim”, last_name : “Callaghan”, email_address : “[email protected]” }
schema 2 - short field names (26 less bytes per doc){ fn : “Tim”, ln : “Callaghan”, ea : “[email protected]” }
1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
TokuMX is substantially smaller, even
without compression
1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
In TokuMX, field name length has almost no impact on size due to
compression
MongoDB was ~10% smaller
21
Pitfall 2 : Replication
2 : Replication
• MongoDB natively supports replication• High availability• Read scaling
• Shortcomings• lag, resource consumption on secondaries
• Recommended reading• http://blog.mongolab.com/2013/03/replication-lag-the-facts-of-life/
2 : Replication : Avoidance
• TokuMX replication allows secondary servers to process replication without IO• Simply injecting messages into the Fractal Tree Indexes on the secondary server
• The “Hard Work” was done on the primary•Read-before-write•Uniqueness checking
• Elimination of replication lag• Your secondaries are fully available for read scaling!
• Run multiple secondaries on a single server
23
2 : Replication : Test
• Sysbench• Workload
•point + range queries, update, delete, insert
•16 collections, 10mm rows, 16GB RAM• Setup
•loaded data on single server•shutdown and copied data folder•created secondary
• Ran benchmark
24
25
2 : Replication : Analyze
Note: TokuMX @ 32 TPS, MongoDB @ 12TPS
26
Pitfall 3 : Declining Performance
3 : Declining Performance
• MongoDB insert/update/delete performance drops dramatically when the indexes do not fit in memory
• Operations are limited by IOPs• Generally 1 operation per available IO• Less if secondary index maintenance, 1 IO for each
• Solution: Add RAM or Shard.
3 : Declining Performance : Avoidance
28
• TokuMX runs on Tokutek’s Fractal Tree indexes• Message buffers delay IO and reduce cache disruption• Perform many operations per IO
• Many workloads don’t need additional memory or sharding, they just need better indexing• RAM = $$$• Sharding = $$$ + Complexity
29
• indexed insertion workload (iibench)• http://github.com/tmcallaghan/iibench-mongodb
{ dateandtime: <date-time>,
cashregisterid: 1..1000,
customerid: 1..100000,
productid: 1..10000,
price: <double> }
• insert only, 1000 documents per insert, 100 million inserts• indexes
• price + customerid• cashregister + price + customerid• price + dateandtime + customerid
3 : Declining Performance : Test
• 100mm inserts into a collection with 3 secondary indexes
30
3 : Declining Performance : Analyze
31
3 : Declining Performance : Analyze
• 100mm inserts into a collection with 3 secondary indexes
• Array Index Insertion (100 values per document)
32
3 : Declining Performance : Analyze
33
Pitfall 4 : Concurrency
4 : Concurrency
• MongoDB originally implemented a global write lock• 1 writer at a time
• MongoDB v2.2 moved this lock to the database level• 1 writer at a time in each database
• This severely limits the write performance of servers
• 36 shards on 1 server example• Allows for more concurrency• High operational complexity• Google “mongodb multiple shards same serve
r”
• TokuMX performs locking at the document level• Extreme concurrency!
35
4 : Concurrency : Avoidance
instance
database database
collection
collection
collection
collection
document
document
document
document
document
document
document
document
document
document
MongoDB v2.2
MongoDB v2.0
TokuMX
• Sysbench read-write workload• point and range queries, update, delete, insert
• http://github.com/tmcallaghan/sysbench-mongodb
{ _id: 1..10000000, k: 1..10000000, c: <120 char random string ###-###-###>, pad: <60 char random string ###-###-###>}
36
4 : Concurrency : Test
37
4 : Concurrency : Analyze
38
4 : Concurrency : Analyze
39
Pitfall 5 : Transactions
5 : Got Transactions?
• MongoDB does not support “transactions”• Each operation is visible to everyone• There are work-arounds, Google “mongodb transactions”
• http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/This document provides a pattern for doing multi-
document updates or “transactions” using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback like functionality.
(the document is 8 web pages long)
• MongoDB does not support multi-version concurrency control (MVCC)
• Readers do not get a consistent view of the data, as they can be interrupted by writers
• People try, Google “mongodb mvcc”
• ACID• In MongoDB, multi-insertion operations allow for partial success• Asked to store 5 documents, 3 succeeded
• TokuMX offers “all or nothing” behavior• Document level locking
• MVCC• In MongoDB, queries can be interrupted by writers.• The effect of these writers are visible to the reader
• TokuMX offers MVCC• Reads are consistent as of the operation start
41
5 : Transactions : Avoidance
• Transactions in TokuMX• db.runCommand({“beginTransaction”})• ... perform 1 or more operations• db.runCommand(“rollbackTransaction”) | db.runCommand(“commitTransaction”)
• Note: not available in sharded environments
• For more information• http://www.tokutek.com/2013/04/mongodb-transactions-yes/• http://www.tokutek.com/2013/04/mongodb-multi-statement-
transactions-yes-we-can/
42
5 : Transactions : Avoidance
Tokutek: Database Performance Engines
43
Any Questions?Download TokuMX at www.tokutek.com/download
Register for product updates, access to premium content, and invitations at www.tokutek.com
Join the Conversation