managing a maturing mongodb ecosystem

49
Charity Majors @mipsytipsy Thursday, June 20, 13

Upload: mongodb

Post on 15-Jan-2015

1.213 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Managing a Maturing MongoDB Ecosystem

Charity Majors@mipsytipsy

Thursday, June 20, 13

Page 2: Managing a Maturing MongoDB Ecosystem

Managing a maturing MongoDB ecosystem

Thursday, June 20, 13

Page 3: Managing a Maturing MongoDB Ecosystem

automating with chef

performance tuning

disaster recovery

Thursday, June 20, 13

Page 4: Managing a Maturing MongoDB Ecosystem

chef.

Thursday, June 20, 13

Page 5: Managing a Maturing MongoDB Ecosystem

Basic replica set

Thursday, June 20, 13

Page 6: Managing a Maturing MongoDB Ecosystem

How do I chef that?

... grab the AWS and mongodb cookbooks, create a site wrapper cookbook

Thursday, June 20, 13

Page 7: Managing a Maturing MongoDB Ecosystem

make a role for your cluster,

launch some nodes,

Thursday, June 20, 13

Page 8: Managing a Maturing MongoDB Ecosystem

initiate the replica set,

... and you’re done.

Thursday, June 20, 13

Page 9: Managing a Maturing MongoDB Ecosystem

Adding snapshots

Thursday, June 20, 13

Page 10: Managing a Maturing MongoDB Ecosystem

adding RAID for EBS volumes

Thursday, June 20, 13

Page 11: Managing a Maturing MongoDB Ecosystem

this will bootstrap a new node for the cluster from snapshots

with this role ...

Thursday, June 20, 13

Page 12: Managing a Maturing MongoDB Ecosystem

multiple clusters

distinct cluster name, backup host, backup volumes

Thursday, June 20, 13

Page 13: Managing a Maturing MongoDB Ecosystem

sharding

Thursday, June 20, 13

Page 14: Managing a Maturing MongoDB Ecosystem

assign a shard name per cluster, per role

treat them like ordinary replica sets

Thursday, June 20, 13

Page 15: Managing a Maturing MongoDB Ecosystem

Arbiters

• Mongod processes that do nothing but vote

• Highly reliable

• To provision an arbiter, use the LWRP

• Easy to run multiple arbiters on a single host

Thursday, June 20, 13

Page 16: Managing a Maturing MongoDB Ecosystem

arbiter LWRP

Thursday, June 20, 13

Page 17: Managing a Maturing MongoDB Ecosystem

replica set with arbiters

Thursday, June 20, 13

Page 18: Managing a Maturing MongoDB Ecosystem

run multiple arbiters on a single host:

Thursday, June 20, 13

Page 19: Managing a Maturing MongoDB Ecosystem

Managing votes with arbiters

Thursday, June 20, 13

Page 20: Managing a Maturing MongoDB Ecosystem

tuning and performance.

Thursday, June 20, 13

Page 21: Managing a Maturing MongoDB Ecosystem

resources and provisioning

tuning your filesystem

snapshotting and warmups

fragmentation

Thursday, June 20, 13

Page 22: Managing a Maturing MongoDB Ecosystem

Provisioning tips

• Memory is your primary scaling constraint

• Your working set must fit in to memory

• in 2.4, estimate with:

• Page faults? Your working set may not fit

Thursday, June 20, 13

Page 23: Managing a Maturing MongoDB Ecosystem

Disk options

• If you’re on Amazon:

• EBS

• Dedicated SSD

• Provisioned IOPS

• Ephemeral

• If not:

• use SSDs!

Thursday, June 20, 13

Page 24: Managing a Maturing MongoDB Ecosystem

EBS classic

EBS with PIOPS:

... just say no to EBS

Thursday, June 20, 13

Page 25: Managing a Maturing MongoDB Ecosystem

SSD (hi1.4xlarge)

• 8 cores

• 60 gigs RAM

• 2 1-TB SSD drives

• 120k random reads/sec

• 85k random writes/sec

• expensive! $2300/mo on demand

Thursday, June 20, 13

Page 26: Managing a Maturing MongoDB Ecosystem

PIOPS

• Up to 2000 IOPS/volume

• Up to 1024 GB/volume

• Variability of < 0.1%

• Costs double regular EBS

• Supports snapshots

• RAID together multiple volumes for more storage/performance

Thursday, June 20, 13

Page 27: Managing a Maturing MongoDB Ecosystem

• multiply that by 2-3x depending on your spikiness

Estimating PIOPS

• estimate how many IOPS to provision with the “tps” column of sar -d 1

Thursday, June 20, 13

Page 28: Managing a Maturing MongoDB Ecosystem

EphemeralStorage

• Cheap

• Fast

• No network latency

• No snapshot capability

• Data is lost forever if you stop or resize the instance

Thursday, June 20, 13

Page 29: Managing a Maturing MongoDB Ecosystem

Filesystem and limits

• Raise file descriptor limits

• Raise connection limits

• Mount with noatime and nodiratime

• Consider putting the journal on a separate volume

Thursday, June 20, 13

Page 30: Managing a Maturing MongoDB Ecosystem

Blockdev

• Your default blockdev is probably wrong

• Too large? you will underuse memory

• Too small? you will hit the disk too much

• Experiment.

Thursday, June 20, 13

Page 31: Managing a Maturing MongoDB Ecosystem

Snapshot best practices

• Set priority = 0

• Set hidden = 1

• Consider setting votes = 0

• Lock mongo or stop mongod before snapshot

• Consider running continuous compaction on snapshot node

Thursday, June 20, 13

Page 32: Managing a Maturing MongoDB Ecosystem

Restoring from snapshot

• EBS snapshot will lazily-load blocks from S3

• run “dd” on each of the data files to pull blocks down

• Always warm up a secondary before promoting

• warm up both indexes and data

• http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/

• in mongodb 2.2 and above you can use the touch command:

Thursday, June 20, 13

Page 33: Managing a Maturing MongoDB Ecosystem

Fragmentation

• Your RAM gets fragmented too!

• Leads to underuse of memory

• Deletes are not the only source of fragmentation

• Repair, compact, or resync regularly

Thursday, June 20, 13

Page 34: Managing a Maturing MongoDB Ecosystem

3 ways to fix fragmentation:

• Re-sync a secondary from scratch

• hard on your primary; rs.syncFrom() a secondary

• Repair a secondary

• can cause small discrepancies in your data

• Run continuous compaction on your snapshot node

• won’t reset padding factors

• not appropriate if you do lots of deletes

Thursday, June 20, 13

Page 35: Managing a Maturing MongoDB Ecosystem

Fragmentation is terrible

Thursday, June 20, 13

Page 36: Managing a Maturing MongoDB Ecosystem

Upgrade!

mongo is getting faster. :)

Thursday, June 20, 13

Page 37: Managing a Maturing MongoDB Ecosystem

disasters and recovery.

Thursday, June 20, 13

Page 38: Managing a Maturing MongoDB Ecosystem

Finding bad queries

• db.currentOp()

• mongodb.log

• profiling collection

Thursday, June 20, 13

Page 39: Managing a Maturing MongoDB Ecosystem

db.currentOp()

• Check the queue size

• Any indexes building?

• Sort by num_seconds

• Sort by num_yields, locktype

• Consider adding comments to your queries

• Run explain() on queries that are long-running

Thursday, June 20, 13

Page 40: Managing a Maturing MongoDB Ecosystem

mongodb.log

• Configure output with --slowms

• Look for high execution time, nscanned, ntoreturn

• See which queries are holding long locks

• Match connection ids to IPs

Thursday, June 20, 13

Page 41: Managing a Maturing MongoDB Ecosystem

system.profile collection

• Enable profiling with db.setProfiling()

• Does not persist through restarts

• Like mongodb.log, but queryable

• Writes to this collection incur some cost

• Use db.system.profile.find() to get slow queries for a certain collection, time range, execution time, etc

Thursday, June 20, 13

Page 42: Managing a Maturing MongoDB Ecosystem

• Know what your tipping point looks like

• Don’t switch your primary or restart

• Do kill queries before the tipping point

• Write your kill script before you need it

• Don’t kill internal mongo operations, only queries.

... when queries pile up ...

Thursday, June 20, 13

Page 43: Managing a Maturing MongoDB Ecosystem

can’t elect a master?

• Never run with an even number of votes (max 7)

• You need > 50% of votes to elect a primary

• Set your priority levels explicitly if you need warmup

• Consider delegating voting to arbiters

• Set snapshot nodes to be nonvoting if possible.

• Check your mongo log. Is something vetoing? Do they have an inconsistent view of the cluster state?

Thursday, June 20, 13

Page 44: Managing a Maturing MongoDB Ecosystem

secondaries crashing?

• Some rare mongo bugs will cause all secondaries to crash unrecoverably

• Never kill oplog tailers or other internal database operations, this can also trash secondaries

• Arbiters are more stable than secondaries, consider using them to form a quorum with your primary

Thursday, June 20, 13

Page 45: Managing a Maturing MongoDB Ecosystem

replication stops?

• Other rare bugs will stop replication or cause secondaries to exit without a corrupt op

• The correct way to fix this is to re-snapshot off the primary and rebuild your secondaries.

• However, you can sometimes *dangerously* repair a secondary:

1. stop mongo

2. bring it back up in standalone mode

3. repair the offending collection

4. restart mongo again as part of the replica set

Thursday, June 20, 13

Page 46: Managing a Maturing MongoDB Ecosystem

• Everything is getting vaguely slower?

• check your padding factor, try compaction

• You rs.remove() a node and get weird driver errors?

• always shut down mongod after removing from replica set

• Huge background flush spike?

• probably an EBS or disk problem

• You run out of connection limits?

• possibly a driver bug

• hard-coded to 80% of soft ulimit until 20k is reached.

Thursday, June 20, 13

Page 47: Managing a Maturing MongoDB Ecosystem

• It looks like all I/O stops for a while?

• check your mongodb.log for large newExtent warnings

• also make sure you aren’t reaching PIOPS limits

• You get weird driver errors after adding/removing/re-electing?

• some drivers have problems with this, you may have to restart

Thursday, June 20, 13

Page 48: Managing a Maturing MongoDB Ecosystem

Glossary of resources

• Opscode AWS cookbook

• https://github.com/opscode-cookbooks/aws

• edelight MongoDB cookbook

• https://github.com/edelight/chef-mongodb

• Parse MongoDB cookbook fork

• https://github.com/ParsePlatform/Ops/tree/master/chef/cookbooks/mongodb

• Parse compaction scripts and warmup scripts

• http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/

• http://blog.parse.com/2013/03/26/always-be-compacting/

Thursday, June 20, 13

Page 49: Managing a Maturing MongoDB Ecosystem

Charity Majors@mipsytipsy

Thursday, June 20, 13