scaling mongodb on amazon web services (dat209) | aws re:invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DAT209 - Scaling MongoDB on Amazon Web

Services

Michael Saffitz, CTO & Co-Founder, Apptentive

November 15, 2013

Nice to Meet You!

Apptentive

The easiest way for anyone with an app to talk with their customers

Follow at: @apptentive • Connect at: [email protected]

Mike Saffitz

CTO, Co-Founder, Apptentive

Follow at: @msaffitz • Connect at: [email protected]

Apptentive & AWS

Apptentive & AWS

api.apptentive.com

www.apptentive.com

(Elastic Load

Balancer)

Web Servers

EC2: 6 x c1.medium

S3 CloudFront

VPN Server

EC2: m1.small

Sharded MongoDB

Cluster

EC2: 9 Instances

Redis

EC2: m1.medium

CI & Chef

EC2: m1.medium

m1.small

Stats & Logging

EC2: 2x m1.medium

m1.small

CloudWatch

Elastic

MapReduce

apptentive.com/blog

Elastic Beanstalk, RDS

IAM

Route53

Virtual Private Cloud

Agenda

• Why Scale MongoDB on AWS?

• Planning

• Deploying

• Maintaining

Why Scale MongoDB on AWS?

Why Scale MongoDB on AWS?

Easy Flexible Cost

Effective

Simple To

Administer

Broad

Language

Support

Friendly Query

Syntax

Rapidly Scale

On Demand

Well

Documented

Supports

Diverse Set of

Scenarios

Fine Grain Control

Over Price &

Performance

Competitive

TCO

Why Not Scale MongoDB on AWS?

Your Data is

Predominately

Relational in Nature

Don’t Want to Incur the

Administrative Costs

Consider RDS

Consider DynamoDB Hosted Alternatives

1. Planning

Planning Checklist

• Topologies – MongoDB

– AWS

• Instance Selection

• Storage

MongoDB Topologies: Single Server

mongod

MongoDB Topologies: Single ReplicaSet w/ Arbiter

mongod

(primary)

mongod

(secondary)

mongod

(arbiter)

Contains Full Copy of

Data on the Primary –

Can be Used for Reads

Arbiter Only Participates

in Voting to Elect a New

Primary

(Must Have Odd #)

Automatic

Failover

MongoDB Topologies: Single ReplicaSet

mongod

(primary)

mongod

(secondary)

mongod

(secondary)

Automatic

Failover

Scale Across

Instance

Types

Data Replicated Within ReplicaSet

MongoDB Topologies: Sharded Cluster

mongod

(primary)

mongod

(secondary)

mongod

(secondary) mongod

(primary)

mongod

(secondary)

mongod

(secondary)

config config config

…

… mongos

App Server

mongos

App Server

mongod

process

Data Partitioned Across Shards

Data Replicated Within Shard

MongoDB Topologies: Picking One

• Single Server? Not For Production

• Don’t Shard Prematurely – ReplicaSets can take you surprisingly far

• … But Don’t Wait Too Long to Shard – Collections over 256GB may have issues migrating to shards

– Rebalancing consumes IO and can be very slow

• Pick the Right Instance Size for Your Topology… – We’re going to get to this in a moment

AWS Topologies: AZs & Regions

• Obvious: Distribute Across Availability Zones in a Region – No Single Point of Failure

• Distributing Across Regions – Shard per Region versus Shards Across Regions

– Considerations • Replication Latency

• Data Transfer Costs

• Administration Costs

• Speedup from Geo-Based Tag Aware Sharding

Selecting an Instance: Considerations

Compute Memory

EBS Optimized?

Cost

Selecting an Instance: Compute

• Most Likely to Not Be A Significant Factor – Exceptions: Heavy use of Map/Reduce, Aggregation Framework

– Mongo 2.4 added concurrency via V8

– Important! Only run 64-Bit ; 32-Bit is limited to ~2GB

• Real World Numbers on m1.large:

Selecting an Instance: Memory

• Estimate Necessary Working Set – db.runCommand( { serverStatus: 1, workingSet: 1 } )

Is pagesInMemory * 4k approaching total RAM? Is overSeconds decreasing / small?

– db.stats()

• Pick the Instance that Matches

• Monitor on MMS – Page Faults (abstract)

– Queues (better)

– Response Times (best)

Selecting an Instance: EBS Optimization

• Run EBS Optimized When Available – Especially with Provisioned IOPs

• Volume Config Impacts IO Perf Far More than

Instance Selection

Storage

• Instance Storage – Non-Durable

– Fast But Inconsistent Performance

– Can’t Use Snapshots for Backups

• “Standard” EBS – Slower

– Higher Variability Performance

• Provisioned IOPs EBS – Consistent Performance

– Don’t Under Provision -- Watch Queue Length

Storage

• RAID 10? Just use LVM on RAID 0 – More: http://blog.mongohq.com/debunking-myth-of-raid-10-as-

best-practice-on-aws/

• Use XFS or Ext4

• Mount with noatime, noexec, nodiratime

Selecting an Instance: Summary

1. Lead with Working Set Requirements

2. Validate Compute is Sufficient

3. Enable EBS Optimized if Available

4. Use Provisioned IOPS EBS

5. (Confirm Cost is Acceptable)

2. Deploying

It’s Easy. Let me show you.

Scaling Deployment

• DevOps: Go for ‘bilities: – Reliability, Predictability, Repeatability, and Auditability

• The Result is Easy Replaceability and Scalability – Build your infrastructure so it can be treated like an appliance

– The impact of your decisions during planning will be significantly mitigated

DevOps Tools

• AWS Marketplace AMIs – Preconfigured with MongoDB best practices

– Do-it-yourself scaling to ReplicaSets / Shards

– Helpful, but not a DevOps Solution

• AWS CloudFormation – Templates for Resource Setup & Initial Configuration

• Chef, Puppet, Ansible, SaltStack, & More – AWS OpsWorks, but limited by chef-solo

Security

• Run in a VPC – Complications: Cross Region, Multiple Source Ingress

• Use KeyFiles & Roles – KeyFiles: Internal authentication for cluster members

– Roles allow for user-level fine grain access control

• Advanced: – Keberos support in MongoDB 2.4

– SSL Support in Custom Builds & MongoDB Enterprise

3. Maintaining

Monitoring: MongoDB Monitoring Service

• Very Good, Free Holistic Monitoring – Important: ReplLag, Page Faults, Lock %

– Informative: OpCounters, Connections, Queue Lengths

• Includes Basic Alerting of Host Failures and Metric Thresholds

• Query Profiler Details Slow Queries – db.setProfilingLevel(1)

Monitoring: Amazon CloudWatch

• Detailed Resource Level Monitoring – Important: Queue Length, Read/Write Latencies

• Versatile alerting based on Amazon Simple Notification Service (SNS)

Backups

• Delayed Secondary – Questionable as a primary backup strategy

• Dump/Restore – Impractical for larger deployments

• MongoDB Service – Managed, Secure, Point in Time. Unclear suitability for larger deployments

– Expensive

• Snapshots – Fast, Easy, Scalable. Pay Attention to Consistency (RAID, Shards)

Easy Snapshot-Based Backups With Mongolly

• Automatic topology detection, snapshotting, and snapshot management for EBS-backed MongoDB Databases

• Easy as: $ mongolly backup

• https://github.com/msaffitz/mongolly

Conclusions

• MongoDB + AWS =

• Options For All Deployment / Workload Sizes – I/O typically the focal point for optimization

• Investing in a DevOps Strategy + Solution

Makes It Near Effortless

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

DAT209

scaling mongodb on amazon web services (dat209) | aws re:invent 2013

Technology

mongodb topologies

aws topologies

scale mongodb

aws opsworks

dat209 scaling mongodb

medium m1

right instance size

smallredis ec2