scaling with mongodb

Scaling with MongoDB

Rick Copeland @rick446Arborian Consulting, LLC

Who am I?

Now a consultant, but formerly…

Software engineer at SourceForge, early adopter of MongoDB (version 0.8)

Wrote the SQLAlchemy book (I love SQL when it’s used well)

Mainly write Python now, but have done C++, C#, Java, Javascript, VHDL, Verilog, …

Scaling without MongoDB

You can do it with an RDBMS as long as you… Don’t use joins Don’t use transactions Use read-only slaves Use memcached Denormalize your data Use custom sharding/partitioning Do a lot of vertical scaling ▪ (we’re going to need a bigger box)

Vertical scaling

Developer productivity

Scaling with MongoDB

Use documents to improve locality

Optimize your indexes

Be aware of your working set

Scaling your disks

Replication for fault-tolerance and read scaling

Sharding for read and write scaling

Terminology

Relational (SQL) MongoDB

Database Database

Table Collection

Index Index

Row Document

Column Field

DynamicTyping

B-tree(range-based)

Think JSON

Primitive types + arrays,

documents

Documents improve locality

{ title: "Slides for Scaling with MongoDB", author: "Rick Copeland", date: ISODate("20012-02-29T19:30:00Z"), text: "My slides are available on speakerdeck.com", comments: [ { author: "anonymous", date: ISODate("20012-02-29T19:30:01Z"), text: "Frist psot!" }, { author: "mark”, date: ISODate("20012-02-29T19:45:23Z"), text: "Nice slides" } ] }

Embed comment data in blog post document

Disk seeks and data locality

Seek = 5+ ms Read = really really fast


Post

AuthorCommen

t


Post

Author

Comment

Comment

Comment

Comment

Comment

Avoid full collection scans

1 2 3 4 5 6 7

Looked at 7 objects

Find where x equals 7

Indexing: use a tree lookup

7

6

5

4

3

2

1

Looked at 3 objects

Find where x equals 7

Randomly-distributed index

Entire index must fit in RAM

Right-aligned index

Only small portion in

RAM

Be aware of your working set

Working set = sizeof(frequently used data) + sizeof(frequently used indexes)

Right-aligned indexes reduce working set size

Working set should fit in available RAM for best performance

Page faults are the biggest cause of performance loss in MongoDB

Quantify your working set

> db.foo.stats(){ "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848}, "ok" : 1}

Data Size

Average doc size

Size on disk (or RAM!)

Size of all indexes

Size of each index

Scaling your disks: single disk

~200 seeks / second

Scaling your disks: RAID-0

~200 seeks / second ~200 seeks / second ~200 seeks / second

Faster, but less reliable

Scaling your disks: RAID-10

~400 seeks / second ~400 seeks / second ~400 seeks / second

Faster and more reliable ($$$ though)

Replication

Primary

Secondary

Secondary

Read / Write

Read

Read

Old and busted master/slave replication

The new hotness replica sets with automatic failover

Replication

Primary handles all writes

Application optionally sends reads to slaves

Heartbeat manages automatic failover

Replication: the oplog

Special collection (the oplog) records operations idempotently

Secondaries read from primary oplog and replay operations locally

Space is preallocated and fixed for the oplog

Replication: the oplog

{ "ts" : Timestamp(1317653790000, 2), "h" : -6022751846629753359, "op" : "i", "ns" : "confoo.People", "o" : { "_id" : ObjectId("4e89cd1e0364241932324269"), "first" : "Rick", "last" : "Copeland” }}

Insert

Collection name

Object to insert

Replication: failover

Use heartbeat signal to detect failure

When primary can’t be reached, elect a new one

Replica that’s the most up-to-date is chosen

If there is skew, changes not on new primary are saved to a .bson file for manual reconciliation

Application can require data to be replicated to a majority to ensure this doesn’t happen

Replica sets: advanced topics

Priority Slower nodes with lower priority Backup or read-only nodes to never be primary

slaveDelay Fat-finger protection

Data center awareness and tagging Application can ensure complex replication

guarantees

When replication is not enough

Reads scale nicely As long as the working set fits in RAM … and you don’t mind eventual consistency

Sharding to the rescue! Automatically partitioned data sets Scale writes and reads Automatic load balancing between the

shards

Sharding architecture

Shard 10..10

Shard 210..20

Shard 320..30

Shard 430..40

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

MongoS

Configuration

Config 1 Config 2 Config 3

MongoS

Sharding setup

Sharding is per-collection and range-based

The highest-impact choice (and hardest to change decision) you make is the shard key Random keys: good for writes, bad for reads Right-aligned index: bad for writes Small # of discrete keys: very bad Ideal: balance writes, make reads routable by

mongos Optimal shard key selection is hard

Example sharding setup

Primary Data Center Secondary Data Center

Shard 1Priority 1

Shard 1Priority 1

Shard 1Priority 0

Shard 2Priority 1

Shard 2Priority 1

Shard 2Priority 0

Shard 3Priority 1

Shard 3Priority 1

Shard 3Priority 0

Config 1 Config 2 Config 3

RS1

RS2

RS3

Config

Sharding benefits

Writes and reads both scale (with good choice of shard key)

Reads scale while remaining strongly consistent

Partitioning ensures you get more usable RAM

Pitfall: don’t wait too long to add capacity

Questions?

Rick Copeland @rick446Arborian Consulting, LLC

scaling with mongodb

Technology

primary oplog

shard key random keys

working set size working

sizeoffrequently used

priority slower nodes

lower priority backup

rick copeland

scaling sharding