scaling with mongodb

32
Scaling with MongoDB Rick Copeland @rick446 Arborian Consulting, LLC

Upload: rick-copeland

Post on 15-Jan-2015

11.478 views

Category:

Technology


1 download

DESCRIPTION

MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.

TRANSCRIPT

Page 1: Scaling with MongoDB

Scaling with MongoDB

Rick Copeland @rick446Arborian Consulting, LLC

Page 2: Scaling with MongoDB

Who am I?

Now a consultant, but formerly…

Software engineer at SourceForge, early adopter of MongoDB (version 0.8)

Wrote the SQLAlchemy book (I love SQL when it’s used well)

Mainly write Python now, but have done C++, C#, Java, Javascript, VHDL, Verilog, …

Page 3: Scaling with MongoDB

Scaling without MongoDB

You can do it with an RDBMS as long as you… Don’t use joins Don’t use transactions Use read-only slaves Use memcached Denormalize your data Use custom sharding/partitioning Do a lot of vertical scaling ▪ (we’re going to need a bigger box)

Page 4: Scaling with MongoDB

Vertical scaling

Page 5: Scaling with MongoDB

Developer productivity

Page 6: Scaling with MongoDB

Scaling with MongoDB

Use documents to improve locality

Optimize your indexes

Be aware of your working set

Scaling your disks

Replication for fault-tolerance and read scaling

Sharding for read and write scaling

Page 7: Scaling with MongoDB

Terminology

Relational (SQL) MongoDB

Database Database

Table Collection

Index Index

Row Document

Column Field

DynamicTyping

B-tree(range-based)

Think JSON

Primitive types + arrays,

documents

Page 8: Scaling with MongoDB

Documents improve locality

{ title: "Slides for Scaling with MongoDB", author: "Rick Copeland", date: ISODate("20012-02-29T19:30:00Z"), text: "My slides are available on speakerdeck.com", comments: [ { author: "anonymous", date: ISODate("20012-02-29T19:30:01Z"), text: "Frist psot!" }, { author: "mark”, date: ISODate("20012-02-29T19:45:23Z"), text: "Nice slides" } ] }

Embed comment data in blog post document

Page 9: Scaling with MongoDB

Disk seeks and data locality

Seek = 5+ ms Read = really really fast

Page 10: Scaling with MongoDB

Disk seeks and data locality

Post

AuthorCommen

t

Page 11: Scaling with MongoDB

Disk seeks and data locality

Post

Author

Comment

Comment

Comment

Comment

Comment

Page 12: Scaling with MongoDB

Avoid full collection scans

1 2 3 4 5 6 7

Looked at 7 objects

Find where x equals 7

Page 13: Scaling with MongoDB

Indexing: use a tree lookup

7

6

5

4

3

2

1

Looked at 3 objects

Find where x equals 7

Page 14: Scaling with MongoDB

Randomly-distributed index

Entire index must fit in RAM

Page 15: Scaling with MongoDB

Right-aligned index

Only small portion in

RAM

Page 16: Scaling with MongoDB

Be aware of your working set

Working set = sizeof(frequently used data) + sizeof(frequently used indexes)

Right-aligned indexes reduce working set size

Working set should fit in available RAM for best performance

Page faults are the biggest cause of performance loss in MongoDB

Page 17: Scaling with MongoDB

Quantify your working set

> db.foo.stats(){ "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848}, "ok" : 1}

Data Size

Average doc size

Size on disk (or RAM!)

Size of all indexes

Size of each index

Page 18: Scaling with MongoDB

Scaling your disks: single disk

~200 seeks / second

Page 19: Scaling with MongoDB

Scaling your disks: RAID-0

~200 seeks / second ~200 seeks / second ~200 seeks / second

Faster, but less reliable

Page 20: Scaling with MongoDB

Scaling your disks: RAID-10

~400 seeks / second ~400 seeks / second ~400 seeks / second

Faster and more reliable ($$$ though)

Page 21: Scaling with MongoDB

Replication

Primary

Secondary

Secondary

Read / Write

Read

Read

Old and busted master/slave replication

The new hotness replica sets with automatic failover

Page 22: Scaling with MongoDB

Replication

Primary handles all writes

Application optionally sends reads to slaves

Heartbeat manages automatic failover

Page 23: Scaling with MongoDB

Replication: the oplog

Special collection (the oplog) records operations idempotently

Secondaries read from primary oplog and replay operations locally

Space is preallocated and fixed for the oplog

Page 24: Scaling with MongoDB

Replication: the oplog

{ "ts" : Timestamp(1317653790000, 2), "h" : -6022751846629753359, "op" : "i", "ns" : "confoo.People", "o" : { "_id" : ObjectId("4e89cd1e0364241932324269"), "first" : "Rick", "last" : "Copeland” }}

Insert

Collection name

Object to insert

Page 25: Scaling with MongoDB

Replication: failover

Use heartbeat signal to detect failure

When primary can’t be reached, elect a new one

Replica that’s the most up-to-date is chosen

If there is skew, changes not on new primary are saved to a .bson file for manual reconciliation

Application can require data to be replicated to a majority to ensure this doesn’t happen

Page 26: Scaling with MongoDB

Replica sets: advanced topics

Priority Slower nodes with lower priority Backup or read-only nodes to never be primary

slaveDelay Fat-finger protection

Data center awareness and tagging Application can ensure complex replication

guarantees

Page 27: Scaling with MongoDB

When replication is not enough

Reads scale nicely As long as the working set fits in RAM … and you don’t mind eventual consistency

Sharding to the rescue! Automatically partitioned data sets Scale writes and reads Automatic load balancing between the

shards

Page 28: Scaling with MongoDB

Sharding architecture

Shard 10..10

Shard 210..20

Shard 320..30

Shard 430..40

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

MongoS

Configuration

Config 1 Config 2 Config 3

MongoS

Page 29: Scaling with MongoDB

Sharding setup

Sharding is per-collection and range-based

The highest-impact choice (and hardest to change decision) you make is the shard key Random keys: good for writes, bad for reads Right-aligned index: bad for writes Small # of discrete keys: very bad Ideal: balance writes, make reads routable by

mongos Optimal shard key selection is hard

Page 30: Scaling with MongoDB

Example sharding setup

Primary Data Center Secondary Data Center

Shard 1Priority 1

Shard 1Priority 1

Shard 1Priority 0

Shard 2Priority 1

Shard 2Priority 1

Shard 2Priority 0

Shard 3Priority 1

Shard 3Priority 1

Shard 3Priority 0

Config 1 Config 2 Config 3

RS1

RS2

RS3

Config

Page 31: Scaling with MongoDB

Sharding benefits

Writes and reads both scale (with good choice of shard key)

Reads scale while remaining strongly consistent

Partitioning ensures you get more usable RAM

Pitfall: don’t wait too long to add capacity

Page 32: Scaling with MongoDB

Questions?

Rick Copeland @rick446Arborian Consulting, LLC