mongodb - scaling write performance

46
MongoDB: Scaling write performance Junegunn Choi

Upload: daum-dna

Post on 29-Nov-2014

10.251 views

Category:

Documents


9 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Mongodb - Scaling write performance

MongoDB:Scaling write performance

Junegunn Choi

Page 2: Mongodb - Scaling write performance

MongoDB

•Document data store

• JSON-like document

• Secondary indexes

• Automatic failover

• Automatic sharding

Page 3: Mongodb - Scaling write performance

First impression:Easy

• Easy installation

• Easy data model

•No prior schema design

•Native support for secondary indexes

Page 4: Mongodb - Scaling write performance

Second thought:Not so easy

•No SQL

• Coping with massive data growth

• Setting up and operating sharded cluster

• Scaling write performance

Page 5: Mongodb - Scaling write performance

Today we’ll talk aboutinsert performance

Page 6: Mongodb - Scaling write performance

Insert throughputon a replica set

Page 7: Mongodb - Scaling write performance

* 1kB record. ObjectId as PK* WriteConcern: Journal sync on Majority

Steady 5k inserts/sec

Page 8: Mongodb - Scaling write performance

Insert throughputon a replica set

with a secondary index

Page 9: Mongodb - Scaling write performance
Page 10: Mongodb - Scaling write performance

Culprit:B+Tree index

• Good at sequential insert

• e.g. ObjectId, Sequence #, Timestamp

• Poor at random insert

• Indexes on randomly-distributed data

Page 11: Mongodb - Scaling write performance

Sequential vs. Random insert

123456789101112

B+Tree

55757819936809152635633

B+Tree

Sequential insert ➔ Small working set➔ Fits in RAM ➔ Sequential I/O

(bandwidth-bound)

Random insert ➔ Large working set➔ Cannot fit in RAM ➔ Random I/O

(IOPS-bound)

working set working set

Page 12: Mongodb - Scaling write performance

So, what do we do now?

Page 13: Mongodb - Scaling write performance

B+Tree

1. Partitioning

does not fit in memory

Aug 2012 Sep 2012 Oct 2012

fits in memory

Page 14: Mongodb - Scaling write performance

1. Partitioning

•MongoDB doesn’t support partitioning

• Partitioning at application-level

• e.g. Daily log collection

• logs_20121012

Page 15: Mongodb - Scaling write performance

Switch collection every hour

Page 16: Mongodb - Scaling write performance

2. Better H/W

•More RAM

•More IOPS

• RAID striping

• SSD

• AWS Provisioned IOPS (1k ~ 10k)

Page 17: Mongodb - Scaling write performance
Page 18: Mongodb - Scaling write performance

SHARD3SHARD2SHARD1

3. More H/W: Sharding

• Automatic partitioning across nodes

mongos router

Page 19: Mongodb - Scaling write performance

3 shards (3x3)

Page 20: Mongodb - Scaling write performance

3 shards (3x3)on RAID 1+0

Page 21: Mongodb - Scaling write performance

There’s no free lunch• Manual partitioning

• Incidental complexity

• Better H/W

• $

• Sharding

• $$

• Operational complexity

Page 22: Mongodb - Scaling write performance

“Do you really need that index?”

Page 23: Mongodb - Scaling write performance

Scaling insert performancewith sharding

Page 24: Mongodb - Scaling write performance

=Choosing the right shard key

Page 25: Mongodb - Scaling write performance

SHARD3

USERS

Shard key example:year_of_birth

SHARD1

USERS

~ 1950 1951 ~ 1970

1991 ~ 2005

SHARD2

USERS

1971 ~ 1990

2010 ~ ∞

2006 ~ 2010

64MB chunk

mongos router

Page 26: Mongodb - Scaling write performance

5k inserts/sec w/o sharding

Page 27: Mongodb - Scaling write performance

Sequential key

•ObjectId as shard key

• Sequence #

• Timestamp

Page 28: Mongodb - Scaling write performance

Worse throughput with 3x H/W.

Page 29: Mongodb - Scaling write performance

Sequential key

• All inserts into one chunk

• Cannot scale insert performance

• Chunk migration overheadSHARD-x

USERS

1000 ~ 2000

9000 ~ ∞

5000 ~ 7500

9001, 9002, 9003, 9004, ...

Page 30: Mongodb - Scaling write performance

Sequential key

Page 31: Mongodb - Scaling write performance

Hash key

• e.g. SHA1(_id) = 9f2feb0f1ef425b292f2f94 ...

•Distributes inserts evenly across all chunks

Page 32: Mongodb - Scaling write performance
Page 33: Mongodb - Scaling write performance

Hash key

• Performance drops as collection grows

•Why? Mandatory index on shard key

• B+Tree problem again!

Page 34: Mongodb - Scaling write performance

Sequential keyHash key

Page 35: Mongodb - Scaling write performance

Sequential + hash key• Coarse-grained sequential prefix

• e.g. Year-month + hash value

• 201210_24c3a5b9

B+Tree

201210_*201209_*201208_*

Page 36: Mongodb - Scaling write performance

B+Tree

But what if...

201210_*201209_*201208_*

large working set

Page 37: Mongodb - Scaling write performance

Sequential + hash key

• Can you predict data growth rate?

• Balancer not clever enough

•Only considers # of chunks

•Migration slow during heavy-writes

Page 38: Mongodb - Scaling write performance

Sequential keyHash key

Sequential + hash key

Page 39: Mongodb - Scaling write performance

Low-cardinality hash key

• Small portion of hash value

• e.g. A~Z, 00~FF

• Alleviates B+Tree problem

• Sequential access on fixed # of parts

• Cardinality / # of shards

LocalB+Tree

A B C

Shard key range: A ~ D

AA BB CC

Page 40: Mongodb - Scaling write performance
Page 41: Mongodb - Scaling write performance

Low-cardinality hash key

• Limits the # of possible chunks

• e.g. 00 ~ FF ➔ 256 chunks

• Chunk grows past 64MB

• Balancing becomes difficult

Page 42: Mongodb - Scaling write performance

Sequential keyHash key

Sequential + hash keyLow-cardinality hash key

Page 43: Mongodb - Scaling write performance

Low-cardinality hash prefix+ sequential part

• e.g. Short hash prefix + timestamp

•Nice index access pattern

• Unlimited number of chunks

LocalB+Tree

A123 B123 C123

Shard key range: A000 ~ C999

A000 B000 C000

Page 44: Mongodb - Scaling write performance

Finally, 2x throughput

Page 45: Mongodb - Scaling write performance

Lessons learned• Know the performance impact of secondary index

• Choose the right shard key

• Test with large data sets

• Linear scalability is hard

• If you really need it, consider HBase or Cassandra

• SSD