partner webinar: the scaling checklist for mongodb - 100gb and beyond

Post on 26-Jan-2015

113 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

MongoHQ knows there is something special about 100 GB of data. Our customers that hit 100 GB are running core pieces of their business on a scalable MongoDB platform. In this presentation, we will walk through a cloud focused scaling checklist that will help you quickly and securely blow past the 100 GB milestone. Using customer examples and best practice MongoDB use cases, we'll help prepare you to get to the data size your business needs.

TRANSCRIPT

www.mongohq.com Scaling Checklist for MongoDB

Scaling Checklist for MongoDB

100GB & Beyond

www.mongohq.com Scaling Checklist for MongoDB

MongoHQwww.mongohq.com | @mongohq

MongoHQ is a fully-managed platform used by developers to deploy, host and scale open-source databases.

Chris Winslettchris@mongohq.com

I’ve spoken at a number of MongoDB conferences on optimizing queries. I’ve been with MongoHQ for two

years – prior to that I built applications for the education and technical sectors.

www.mongohq.com Scaling Checklist for MongoDB

TL;DR

• 100GB of data is relatively big data• MongoDB has comparative advantages• MongoDB has absolute constraints• Know the MongoDB gauges• Surpassing 100GB requires:– Understanding absolute constraints.– Knowledge of application’s data consumption– Optimization of data consumption to comparative

advantages

www.mongohq.com Scaling Checklist for MongoDB

Audience Survey

What is your data size? Choose the biggest bucket.

A. < 10GBB. < 50GBC. < 75GBD. < 100GBE. > 100 GB

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Identify your data behavior

1. Small v. Large – type of data2. Fast v. Slow – behavior of data3. Complex v. Simple – type of queries4. Known v. Unknown – behavior of queries5. Queuing v. Application data

This can happen at planning, staging, or production phase.

www.mongohq.com Scaling Checklist for MongoDB

Patterns of your Data

Small Large

Fast

Slow

www.mongohq.com Scaling Checklist for MongoDB

Small Large

Fast

Slow

Modern applications have all patterns

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Rendered in background

www.mongohq.com Scaling Checklist for MongoDB

Small Large

Fast

Slow

Where doesn’t MongoDB excel?

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Rendered in background

www.mongohq.com Scaling Checklist for MongoDB

4th dimension is time

Main application collections

Today’s Data

Last week’s data

Small Large

Fast

Slow

www.mongohq.com Scaling Checklist for MongoDB

Data-types to avoid with MongoDB

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Small Large

Fast

Slow

Rendered in background

www.mongohq.com Scaling Checklist for MongoDB

What type of queries do you have?

Unknown Known

Simple

Complex

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

Modern applications have all types of queries

Data discovery

Application search

Keyvalue

SingleRange Query

User generated

search

Internal metrics

Multi-Range Query

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

Queries to Avoid with MongoDB

Data discovery

Application search

Keyvalue

SingleRange Query

User generated

search

Internal metrics

Multi-Range Query

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

4th Dimension is Time

Real-time core of

application

Today’s Data

Last week’s data

www.mongohq.com Scaling Checklist for MongoDB

MongoDB

Queries and MongoDB

Elastic SearchSQL

Elastic Search

Unknown Known

Simple

Complex

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

MongoDB’s Technical Comparative Advantage

• Expressive data structure allows simplification of complex data relationships

• Create simple, known queries and return expressive relationships

• On-the-fly addition of attributes / columns

• Total Cost of Ownership*

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

MongoDB Indexing Constraints

• Only one index can be used per query

• Only one range operator can be used per index

• Range operator must be the last field on index

• Know how to use the right side of indexes

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

What does it mean to optimize?

Unknown Known

Simple

Complex

Scaling to 100GBinvolves moving queries fromcomplex to simple and unknown to known

Start

Finish

Start

www.mongohq.com Scaling Checklist for MongoDB

Example of simplifying a query.

Naïve Query:

db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})

Find the most recent messages for a person’s message stream.

Second attempt:

db.messages.find({participant_ids: <id>}).sort({_id: -1})

Best approach

db.users.find({_id: <id>})

www.mongohq.com Scaling Checklist for MongoDB

Naïve Query

{ _id: <id>, message: “Wow, this pizza is good!”, sender_id: <user_id>, recipient_id: <user_id>}

db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})

Document

Query

www.mongohq.com Scaling Checklist for MongoDB

Second Attempt

Document

{ _id: <id>, message: “Wow, this pizza is good!”, sender_id: <sender_id>, recipient_id: <recipient_id>, participant_ids: [<sender_id>,<recipient_id>]}

db.messages.find({participant_ids: <id>}).sort({_id: -1})

Query

www.mongohq.com Scaling Checklist for MongoDB

Best approach

Document

Hint: use the $push, $sort, $slice for the last 50

{ _id: <id>, name: “Clarke Kent”, recent_messages: [ <…50 denormalized messages…> ]}

db.users.find({_id: <id>})

Query

www.mongohq.com Scaling Checklist for MongoDB

How did we optimize?

Unknown Known

Simple

Complex

We took a known, complex query and made it simple.

Finish

Start

www.mongohq.com Scaling Checklist for MongoDB

Methods for Simplifying Queries• Bucket values

• Create summary attributes

• Pre-compute values

• Use expressive documents structures

• Sort and filter at the application level

• Create summary documents

• Divide and measure (more on this later)

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Small Large

Fast

Slow

Remove “unrefactorable” data

Main application collections

ApplicationMetadata

SecondaryApplication Collections

Internal metrics

Event logs and event

data

Queues, OLTP,

Messages

Rendered in background

Redis

www.mongohq.com Scaling Checklist for MongoDB

MongoDB

Move up and right, or find another tool

Unknown Known

Simple

Complex

Data discovery

Application search

User generated

search

Multi-Range Query

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Unknown Known

Simple

Complex

4th Dimension is Time

Real-time core of

application

Today’s Data (fast)

Last week’s data (slower)

www.mongohq.com Scaling Checklist for MongoDB

Separate Data with Cross Purposes

• If this today’s data must be fast, and last week’s data can be slow:– Rollout today’s data using TTL collections– Use another database for last weeks data– Use high-RAM ratio and SSD backed machines for

this today’s data– Use cheaper hardware for last week’s data

www.mongohq.com Scaling Checklist for MongoDB

MongoDB Doesn’t have Joins

Data doesn’t have to be adjacent.

Divide, measure, conquer.

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Stop Use `mongodump`

`mongodump` is long running tablescan that exports all documents. This disrupts RAM and causes performance issues.

Self-hosting: use the MongoDB MMS and Backup

As-a-service: ask your vendor about backup alternatives

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Configure MMS Now!

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Avoid Page Faults like the Plague

50% Table Scans 1% Table Scans 0% Table Scans0

1000

2000

3000

4000

5000

6000

7000

8000

MongoDB Operations / Second

www.mongohq.com Scaling Checklist for MongoDB

MongoDB

What type of queries cause page faults?

Unknown Known

Simple

Complex

Data discovery

Application search

User generated

search

Multi-Range Query

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Track & Remove Slow Queries

• system.profile collection – link

• MongoDB professor – link

• Dex – link

• MongoHQ Slow Query Tracker and Profiler - link

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Buying time with hardware has a limited life

• Don’t get addicted to buying more hardware.

• Before any purchasing decision, always– consider optimization– investigate separating, paring data

www.mongohq.com Scaling Checklist for MongoDB

100 GB Checklist

1. Identify your data behavior2. Use MongoDB for comparative advantages3. Know the MongoDB indexing constraints4. Refactor schema to simplify queries5. Remove data that does not fit MongoDB6. Separate hot and cold data7. Stop using `mongodump`8. Check your gauges9. Avoid queries causing page faults10. Track and monitor slow queries11. Buying time with hardware

www.mongohq.com Scaling Checklist for MongoDB

Thank you!

For any questions:chris@mongohq.com

www.mongohq.com@mongohq

top related