scaling crittercism to 30,000 requests per second and beyond with mongodb
TRANSCRIPT
Scaling to 30,000 Requests Per Secondand Beyond
with MongoDB
Mike ChesnutDirector of Operations Engineering
Crittercism
Scaling to 30,000 Requests Per Secondand Beyond
with MongoDB
Mike ChesnutDirector of Operations Engineering
Crittercism
40,000
How a Startup Gets Started
● Pick something and go with it
How a Startup Gets Started
● Pick something and go with it● Make mistakes along the way
How a Startup Gets Started
● Pick something and go with it● Make mistakes along the way● Correct the mistakes you can
How a Startup Gets Started
● Pick something and go with it● Make mistakes along the way● Correct the mistakes you can● Work around the ones you can’t
How a Startup Gets Started
What I’ll Talk About
What I’ll Talk About
● Crittercism - Background and Architecture
What I’ll Talk About
● Crittercism - Background and Architecture● Router (mongos) Architecture
What I’ll Talk About
● Crittercism - Background and Architecture● Router (mongos) Architecture● Sharding Considerations
What I’ll Talk About
● Crittercism - Background and Architecture● Router (mongos) Architecture● Sharding Considerations● The Balancing Act
What I’ll Talk About
● Crittercism - Background and Architecture● Router (mongos) Architecture● Sharding Considerations● The Balancing Act● Q&A
Critter-What?
Critter-What?
A Brief History...
Critter-What?
Our Founders(Rob, Andrew, Jeeyun)
Critter-What?
Our Founders(Rob, Andrew, Jeeyun)
Let’s make a mobile app!It’ll be awesome!
Critter-What?
(Unnamed Dating App)
Critter-What?
Critter-What?
Critter-What?
Our Founders(Rob, Andrew, Jeeyun)
Our app isn’t so awesomeafter all...
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Architecture
Architecture
Architecture
API
Architecture
APIFeedback
Architecture
APIFeedback
Crashes
Architecture
APIFeedback
App Loads
Crashes
Architecture
APIFeedback
App Loads
Crashes
HandledExceptions
Architecture
APIFeedback
App Loads
Crashes
HandledExceptions
Architecture
API
App Loads
Crashes
HandledExceptions
Architecture
APIApp Loads
Crashes
HandledExceptions
Architecture
API
Crashes
HandledExceptions
App Loads
batch
Architecture
API
Crashes
HandledExceptions
Metadata
App Loads
batch
Architecture
DynamoDB
API
Crashes
HandledExceptions
Metadata
App Loads
batch
Architecture
DynamoDB
API
Crashes
HandledExceptions
Metadata
App Loads
batch
Architecture
DynamoDB
API
API
Crashes
HandledExceptions
Metadata
PerformanceData
Geo Data
App Loads
batch
Architecture
DynamoDB
API
API
Crashes
HandledExceptions
Metadata
PerformanceData
Geo Data
40,000 req/s
App Loads
batch
Growth
Router Architecture
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
MongoDB Cluster
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
clientprocess
application server
clientprocess
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB Clustermongodserver
mongodserver
configserver
config servers
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB Clustermongodserver
mongodserver
configserver
config servers
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB Clustermongodserver
mongodserver
configserver
config servers
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
Single mongos per client problems we encountered:
Router Architecture
Router Architecture
Single mongos per client problems we encountered:● thousands of connections to config servers
Router Architecture
Single mongos per client problems we encountered:● thousands of connections to config servers● config server CPU load
Router Architecture
Single mongos per client problems we encountered:● thousands of connections to config servers● config server CPU load● configdb propagation delays
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
We went from this...
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app msapp
.
.
.
.
.
.
To this.
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB ClusterRouter Tier
Router Architecture
Separate mongos tier advantages:
Router Architecture
Separate mongos tier advantages:● greatly reduced number of connections to each mongod
Router Architecture
Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers
Router Architecture
Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation
Router Architecture
Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation
Disadvantages:
Router Architecture
Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation
Disadvantages:● additional network hop
Router Architecture
Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation
Disadvantages:● additional network hop● host failure has a larger effect
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
mongos-per-host failure:
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
mongos-per-host failure:
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
mongos-per-host failure:
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app msapp
.
.
.
.
.
.
Separate mongos tier failure:
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app msapp
.
.
.
.
.
.
Separate mongos tier failure:
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app msapp
.
.
.
.
.
.
Separate mongos tier failure:
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app msapp
.
.
.
.
.
.
So increase the number of mongos routers:
Router Architecture
RS
RS
RS
confms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
.
.
.
.
.
.
ms
ms
So increase the number of mongos routers:
Router Architecture - Evolve!
Router Architecture - Evolve!
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB ClusterMaybe at first,doing themongos-per-hostarchitectureis fine.
Maybe at first,doing themongos-per-hostarchitectureis fine.
And it will probablyremain finefor quite a while.
Router Architecture - Evolve!
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica setmongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB Cluster
Router Architecture - Evolve!
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongodserver
mongodserver
mongodserver
replica set
mongos
clientprocess
application server
mongos
clientprocess
application server
Client Application(s) MongoDB ClusterRouter TierThis is an areawhere you canand should bewilling to adaptas you go(and as needed).
Sharding Considerations
Pick something you want to live with.
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
What could we have done differently?
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
The Balancing Act
The Balancing Act
Why wouldn’t you run the balancer in the first place?
The Balancing Act
Why wouldn’t you run the balancer in the first place?● great question
The Balancing Act
Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
The Balancing Act
Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left
a bunch of holes○ we turned it off while deleting this data
The Balancing Act
Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left
a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on
The Balancing Act
Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left
a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on
● but maybe you start without it
The Balancing Act
Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left
a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on
● but maybe you start without it● or maybe you need to turn it off for maintenance and forget to turn
it back on
The Balancing Act
Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left
a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on
● but maybe you start without it● or maybe you need to turn it off for maintenance and forget to turn
it back on
Obviously, don’t do this. But if you do, here’s what happens...
The Balancing Act
Fresh, new, empty cluster… But no balancer running.
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
Now we’re pretty full, so let’s add another shard...
The Balancing Act
The Balancing Act
And keep inserting...
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
Suddenly we find ourselves with a very unbalanced cluster.
The Balancing Act
But if we enable the balancer, it will DoS the 5th shard!
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
So what can we do?
The Balancing Act
So what can we do?
1. add IOPS
The Balancing Act
So what can we do?
1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)
The Balancing Act
So what can we do?
1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually
The Balancing Act
So what can we do?
1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually4. approach a balanced state
The Balancing Act
So what can we do?
1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually4. approach a balanced state5. hold your breath
The Balancing Act
So what can we do?
1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually4. approach a balanced state5. hold your breath6. try re-enabling the balancer
The Balancing Act
How to manually balance:
1. determine a chunk on a hot shard2. monitor effects on both the source and target shards3. move the chunk4. allow the system to settle5. repeat
The Balancing Act
Conclusion here:
Run the balancer!
The Balancing Act
● Design ahead of timeo “NoSQL” lets you play it by earo but some of these decisions will bite you later
● Be willing to correct past mistakeso dedicate time and resources to adaptingo learn how to live with the mistakes you can’t correct
Summary
References
● MongoDB Blog post (details on shard migration):http://blog.mongodb.org/post/77278906988/crittercism-scaling-to-billions-of-requests-per-day-on
● MongoDB Webinar (details on manual chunk migrations):http://www.mongodb.com/presentations/webinar-back-basics-3-scaling-30000-requests-second-mongodb
● Documentation on mongos routers:http://docs.mongodb.org/master/core/sharded-cluster-query-routing/
● Documentation on the balancer:http://docs.mongodb.org/manual/tutorial/manage-sharded-cluster-balancer/
● Documentation on shard keys:http://docs.mongodb.org/manual/core/sharding-shard-key/
Crittercism: http://www.crittercism.com/ to learn more,and http://www.crittercism.com/careers/ if you want to help us!
Q&A
Thank You!