partner webinar: the scaling checklist for mongodb - 100gb and beyond
DESCRIPTION
MongoHQ knows there is something special about 100 GB of data. Our customers that hit 100 GB are running core pieces of their business on a scalable MongoDB platform. In this presentation, we will walk through a cloud focused scaling checklist that will help you quickly and securely blow past the 100 GB milestone. Using customer examples and best practice MongoDB use cases, we'll help prepare you to get to the data size your business needs.TRANSCRIPT
www.mongohq.com Scaling Checklist for MongoDB
Scaling Checklist for MongoDB
100GB & Beyond
www.mongohq.com Scaling Checklist for MongoDB
MongoHQwww.mongohq.com | @mongohq
MongoHQ is a fully-managed platform used by developers to deploy, host and scale open-source databases.
Chris [email protected]
I’ve spoken at a number of MongoDB conferences on optimizing queries. I’ve been with MongoHQ for two
years – prior to that I built applications for the education and technical sectors.
www.mongohq.com Scaling Checklist for MongoDB
TL;DR
• 100GB of data is relatively big data• MongoDB has comparative advantages• MongoDB has absolute constraints• Know the MongoDB gauges• Surpassing 100GB requires:– Understanding absolute constraints.– Knowledge of application’s data consumption– Optimization of data consumption to comparative
advantages
www.mongohq.com Scaling Checklist for MongoDB
Audience Survey
What is your data size? Choose the biggest bucket.
A. < 10GBB. < 50GBC. < 75GBD. < 100GBE. > 100 GB
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Identify your data behavior
1. Small v. Large – type of data2. Fast v. Slow – behavior of data3. Complex v. Simple – type of queries4. Known v. Unknown – behavior of queries5. Queuing v. Application data
This can happen at planning, staging, or production phase.
www.mongohq.com Scaling Checklist for MongoDB
Patterns of your Data
Small Large
Fast
Slow
www.mongohq.com Scaling Checklist for MongoDB
Small Large
Fast
Slow
Modern applications have all patterns
Main application collections
ApplicationMetadata
SecondaryApplication Collections
Internal metrics
Event logs and event
data
Queues, OLTP,
Messages
Rendered in background
www.mongohq.com Scaling Checklist for MongoDB
Small Large
Fast
Slow
Where doesn’t MongoDB excel?
Main application collections
ApplicationMetadata
SecondaryApplication Collections
Internal metrics
Event logs and event
data
Queues, OLTP,
Messages
Rendered in background
www.mongohq.com Scaling Checklist for MongoDB
4th dimension is time
Main application collections
Today’s Data
Last week’s data
Small Large
Fast
Slow
www.mongohq.com Scaling Checklist for MongoDB
Data-types to avoid with MongoDB
Main application collections
ApplicationMetadata
SecondaryApplication Collections
Internal metrics
Event logs and event
data
Queues, OLTP,
Messages
Small Large
Fast
Slow
Rendered in background
www.mongohq.com Scaling Checklist for MongoDB
What type of queries do you have?
Unknown Known
Simple
Complex
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
Modern applications have all types of queries
Data discovery
Application search
Keyvalue
SingleRange Query
User generated
search
Internal metrics
Multi-Range Query
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
Queries to Avoid with MongoDB
Data discovery
Application search
Keyvalue
SingleRange Query
User generated
search
Internal metrics
Multi-Range Query
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
4th Dimension is Time
Real-time core of
application
Today’s Data
Last week’s data
www.mongohq.com Scaling Checklist for MongoDB
MongoDB
Queries and MongoDB
Elastic SearchSQL
Elastic Search
Unknown Known
Simple
Complex
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
MongoDB’s Technical Comparative Advantage
• Expressive data structure allows simplification of complex data relationships
• Create simple, known queries and return expressive relationships
• On-the-fly addition of attributes / columns
• Total Cost of Ownership*
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
MongoDB Indexing Constraints
• Only one index can be used per query
• Only one range operator can be used per index
• Range operator must be the last field on index
• Know how to use the right side of indexes
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
What does it mean to optimize?
Unknown Known
Simple
Complex
Scaling to 100GBinvolves moving queries fromcomplex to simple and unknown to known
Start
Finish
Start
www.mongohq.com Scaling Checklist for MongoDB
Example of simplifying a query.
Naïve Query:
db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})
Find the most recent messages for a person’s message stream.
Second attempt:
db.messages.find({participant_ids: <id>}).sort({_id: -1})
Best approach
db.users.find({_id: <id>})
www.mongohq.com Scaling Checklist for MongoDB
Naïve Query
{ _id: <id>, message: “Wow, this pizza is good!”, sender_id: <user_id>, recipient_id: <user_id>}
db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})
Document
Query
www.mongohq.com Scaling Checklist for MongoDB
Second Attempt
Document
{ _id: <id>, message: “Wow, this pizza is good!”, sender_id: <sender_id>, recipient_id: <recipient_id>, participant_ids: [<sender_id>,<recipient_id>]}
db.messages.find({participant_ids: <id>}).sort({_id: -1})
Query
www.mongohq.com Scaling Checklist for MongoDB
Best approach
Document
Hint: use the $push, $sort, $slice for the last 50
{ _id: <id>, name: “Clarke Kent”, recent_messages: [ <…50 denormalized messages…> ]}
db.users.find({_id: <id>})
Query
www.mongohq.com Scaling Checklist for MongoDB
How did we optimize?
Unknown Known
Simple
Complex
We took a known, complex query and made it simple.
Finish
Start
www.mongohq.com Scaling Checklist for MongoDB
Methods for Simplifying Queries• Bucket values
• Create summary attributes
• Pre-compute values
• Use expressive documents structures
• Sort and filter at the application level
• Create summary documents
• Divide and measure (more on this later)
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Small Large
Fast
Slow
Remove “unrefactorable” data
Main application collections
ApplicationMetadata
SecondaryApplication Collections
Internal metrics
Event logs and event
data
Queues, OLTP,
Messages
Rendered in background
Redis
www.mongohq.com Scaling Checklist for MongoDB
MongoDB
Move up and right, or find another tool
Unknown Known
Simple
Complex
Data discovery
Application search
User generated
search
Multi-Range Query
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
4th Dimension is Time
Real-time core of
application
Today’s Data (fast)
Last week’s data (slower)
www.mongohq.com Scaling Checklist for MongoDB
Separate Data with Cross Purposes
• If this today’s data must be fast, and last week’s data can be slow:– Rollout today’s data using TTL collections– Use another database for last weeks data– Use high-RAM ratio and SSD backed machines for
this today’s data– Use cheaper hardware for last week’s data
www.mongohq.com Scaling Checklist for MongoDB
MongoDB Doesn’t have Joins
Data doesn’t have to be adjacent.
Divide, measure, conquer.
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Stop Use `mongodump`
`mongodump` is long running tablescan that exports all documents. This disrupts RAM and causes performance issues.
Self-hosting: use the MongoDB MMS and Backup
As-a-service: ask your vendor about backup alternatives
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Configure MMS Now!
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Avoid Page Faults like the Plague
50% Table Scans 1% Table Scans 0% Table Scans0
1000
2000
3000
4000
5000
6000
7000
8000
MongoDB Operations / Second
www.mongohq.com Scaling Checklist for MongoDB
MongoDB
What type of queries cause page faults?
Unknown Known
Simple
Complex
Data discovery
Application search
User generated
search
Multi-Range Query
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Track & Remove Slow Queries
• system.profile collection – link
• MongoDB professor – link
• Dex – link
• MongoHQ Slow Query Tracker and Profiler - link
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Buying time with hardware has a limited life
• Don’t get addicted to buying more hardware.
• Before any purchasing decision, always– consider optimization– investigate separating, paring data
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Thank you!
For any questions:[email protected]
www.mongohq.com@mongohq