Transcript
Page 1: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Scaling Checklist for MongoDB

100GB & Beyond

Page 2: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoHQwww.mongohq.com | @mongohq

MongoHQ is a fully-managed platform used by developers to deploy, host and scale open-source databases.

Chris [email protected]

I’ve spoken at a number of MongoDB conferences on optimizing queries. I’ve been with MongoHQ for two

years – prior to that I built applications for the education and technical sectors.

Page 3: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

TL;DR

• 100GB of data is relatively big data• MongoDB has comparative advantages• MongoDB has absolute constraints• Know the MongoDB gauges• Surpassing 100GB requires:– Understanding absolute constraints.– Knowledge of application’s data consumption– Optimization of data consumption to comparative

advantages

Page 4: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Audience Survey

What is your data size? Choose the biggest bucket.

A. < 10GBB. < 50GBC. < 75GBD. < 100GBE. > 100 GB

Page 5: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 6: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 7: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Identify your data behavior

1. Small v. Large – type of data2. Fast v. Slow – behavior of data3. Complex v. Simple – type of queries4. Known v. Unknown – behavior of queries5. Queuing v. Application data

This can happen at planning, staging, or production phase.

Page 8: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Patterns of your Data

Small Large

Fast

Slow

Page 9: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Small Large

Fast

Slow

Modern applications have all patterns

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Rendered in background

Page 10: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Small Large

Fast

Slow

Where doesn’t MongoDB excel?

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Rendered in background

Page 11: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

4th dimension is time

Main application collections

Today’s Data

Last week’s data

Small Large

Fast

Slow

Page 12: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Data-types to avoid with MongoDB

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Small Large

Fast

Slow

Rendered in background

Page 13: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

What type of queries do you have?

Unknown Known

Simple

Complex

Page 14: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

Modern applications have all types of queries

Data discovery

Application search

Keyvalue

SingleRange Query

User generated

search

Internal metrics

Multi-Range Query

Page 15: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

Queries to Avoid with MongoDB

Data discovery

Application search

Keyvalue

SingleRange Query

User generated

search

Internal metrics

Multi-Range Query

Page 16: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

4th Dimension is Time

Real-time core of

application

Today’s Data

Last week’s data

Page 17: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoDB

Queries and MongoDB

Elastic SearchSQL

Elastic Search

Unknown Known

Simple

Complex

Page 18: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 19: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoDB’s Technical Comparative Advantage

• Expressive data structure allows simplification of complex data relationships

• Create simple, known queries and return expressive relationships

• On-the-fly addition of attributes / columns

• Total Cost of Ownership*

Page 20: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 21: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoDB Indexing Constraints

• Only one index can be used per query

• Only one range operator can be used per index

• Range operator must be the last field on index

• Know how to use the right side of indexes

Page 22: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 23: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

What does it mean to optimize?

Unknown Known

Simple

Complex

Scaling to 100GBinvolves moving queries fromcomplex to simple and unknown to known

Start

Finish

Start

Page 24: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Example of simplifying a query.

Naïve Query:

db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})

Find the most recent messages for a person’s message stream.

Second attempt:

db.messages.find({participant_ids: <id>}).sort({_id: -1})

Best approach

db.users.find({_id: <id>})

Page 25: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Naïve Query

{ _id: <id>, message: “Wow, this pizza is good!”, sender_id: <user_id>, recipient_id: <user_id>}

db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})

Document

Query

Page 26: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Second Attempt

Document

{ _id: <id>, message: “Wow, this pizza is good!”, sender_id: <sender_id>, recipient_id: <recipient_id>, participant_ids: [<sender_id>,<recipient_id>]}

db.messages.find({participant_ids: <id>}).sort({_id: -1})

Query

Page 27: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Best approach

Document

Hint: use the $push, $sort, $slice for the last 50

{ _id: <id>, name: “Clarke Kent”, recent_messages: [ <…50 denormalized messages…> ]}

db.users.find({_id: <id>})

Query

Page 28: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

How did we optimize?

Unknown Known

Simple

Complex

We took a known, complex query and made it simple.

Finish

Start

Page 29: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Methods for Simplifying Queries• Bucket values

• Create summary attributes

• Pre-compute values

• Use expressive documents structures

• Sort and filter at the application level

• Create summary documents

• Divide and measure (more on this later)

Page 30: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 31: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Small Large

Fast

Slow

Remove “unrefactorable” data

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Rendered in background

Redis

Page 32: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoDB

Move up and right, or find another tool

Unknown Known

Simple

Complex

Data discovery

Application search

User generated

search

Multi-Range Query

Page 33: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 34: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

4th Dimension is Time

Real-time core of

application

Today’s Data (fast)

Last week’s data (slower)

Page 35: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Separate Data with Cross Purposes

• If this today’s data must be fast, and last week’s data can be slow:– Rollout today’s data using TTL collections– Use another database for last weeks data– Use high-RAM ratio and SSD backed machines for

this today’s data– Use cheaper hardware for last week’s data

Page 36: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoDB Doesn’t have Joins

Data doesn’t have to be adjacent.

Divide, measure, conquer.

Page 37: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 38: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Stop Use `mongodump`

`mongodump` is long running tablescan that exports all documents. This disrupts RAM and causes performance issues.

Self-hosting: use the MongoDB MMS and Backup

As-a-service: ask your vendor about backup alternatives

Page 39: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 40: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Configure MMS Now!

Page 41: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 42: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Avoid Page Faults like the Plague

50% Table Scans 1% Table Scans 0% Table Scans0

1000

2000

3000

4000

5000

6000

7000

8000

MongoDB Operations / Second

Page 43: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoDB

What type of queries cause page faults?

Unknown Known

Simple

Complex

Data discovery

Application search

User generated

search

Multi-Range Query

Page 44: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 45: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Track & Remove Slow Queries

• system.profile collection – link

• MongoDB professor – link

• Dex – link

• MongoHQ Slow Query Tracker and Profiler - link

Page 46: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 47: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Buying time with hardware has a limited life

• Don’t get addicted to buying more hardware.

• Before any purchasing decision, always– consider optimization– investigate separating, paring data

Page 48: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

Page 49: Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

www.mongohq.com Scaling Checklist for MongoDB

Thank you!

For any questions:[email protected]

www.mongohq.com@mongohq


Top Related