scaling mongodb on amazon web services (dat209) | aws re:invent 2013
DESCRIPTION
Over the past year, mobile in-app feedback provider Apptentive has scaled MongoDB on AWS from a single machine to a sharded, thousands-of-operations-per-second, several hundred gigabyte cluster. This session—packed with demos, code, and actual performance numbers—shares the lessons learned along the way. Topics include picking the right tools for the job (instance sizing and selection, I/O choices, and topological choices); using chef/AWS OpsWorks and AWS CloudFormation to deploy and scale; monitoring with Amazon CloudWatch and MMS; managing backups with Amazon EBS snapshots; and using Amazon Elastic MapReduce alongside MongoDB instances.TRANSCRIPT
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
DAT209 - Scaling MongoDB on Amazon Web
Services
Michael Saffitz, CTO & Co-Founder, Apptentive
November 15, 2013
Nice to Meet You!
Apptentive
The easiest way for anyone with an app to talk with their customers
Follow at: @apptentive • Connect at: [email protected]
Mike Saffitz
CTO, Co-Founder, Apptentive
Follow at: @msaffitz • Connect at: [email protected]
Apptentive & AWS
Apptentive & AWS
api.apptentive.com
www.apptentive.com
(Elastic Load
Balancer)
Web Servers
EC2: 6 x c1.medium
S3 CloudFront
VPN Server
EC2: m1.small
Sharded MongoDB
Cluster
EC2: 9 Instances
Redis
EC2: m1.medium
CI & Chef
EC2: m1.medium
m1.small
Stats & Logging
EC2: 2x m1.medium
m1.small
CloudWatch
Elastic
MapReduce
apptentive.com/blog
Elastic Beanstalk, RDS
IAM
Route53
Virtual Private Cloud
Agenda
• Why Scale MongoDB on AWS?
• Planning
• Deploying
• Maintaining
Why Scale MongoDB on AWS?
Why Scale MongoDB on AWS?
Easy Flexible Cost
Effective
Simple To
Administer
Broad
Language
Support
Friendly Query
Syntax
Rapidly Scale
On Demand
Well
Documented
Supports
Diverse Set of
Scenarios
Fine Grain Control
Over Price &
Performance
Competitive
TCO
Why Not Scale MongoDB on AWS?
Your Data is
Predominately
Relational in Nature
Don’t Want to Incur the
Administrative Costs
Consider RDS
Consider DynamoDB Hosted Alternatives
1. Planning
Planning Checklist
• Topologies – MongoDB
– AWS
• Instance Selection
• Storage
MongoDB Topologies: Single Server
mongod
MongoDB Topologies: Single ReplicaSet w/ Arbiter
mongod
(primary)
mongod
(secondary)
mongod
(arbiter)
Contains Full Copy of
Data on the Primary –
Can be Used for Reads
Arbiter Only Participates
in Voting to Elect a New
Primary
(Must Have Odd #)
Automatic
Failover
MongoDB Topologies: Single ReplicaSet
mongod
(primary)
mongod
(secondary)
mongod
(secondary)
Automatic
Failover
Scale Across
Instance
Types
Data Replicated Within ReplicaSet
MongoDB Topologies: Sharded Cluster
mongod
(primary)
mongod
(secondary)
mongod
(secondary) mongod
(primary)
mongod
(secondary)
mongod
(secondary)
config config config
…
… mongos
App Server
mongos
App Server
mongod
process
Data Partitioned Across Shards
Data Replicated Within Shard
MongoDB Topologies: Picking One
• Single Server? Not For Production
• Don’t Shard Prematurely – ReplicaSets can take you surprisingly far
• … But Don’t Wait Too Long to Shard – Collections over 256GB may have issues migrating to shards
– Rebalancing consumes IO and can be very slow
• Pick the Right Instance Size for Your Topology… – We’re going to get to this in a moment
AWS Topologies: AZs & Regions
• Obvious: Distribute Across Availability Zones in a Region – No Single Point of Failure
• Distributing Across Regions – Shard per Region versus Shards Across Regions
– Considerations • Replication Latency
• Data Transfer Costs
• Administration Costs
• Speedup from Geo-Based Tag Aware Sharding
Selecting an Instance: Considerations
Compute Memory
EBS Optimized?
Cost
Selecting an Instance: Compute
• Most Likely to Not Be A Significant Factor – Exceptions: Heavy use of Map/Reduce, Aggregation Framework
– Mongo 2.4 added concurrency via V8
– Important! Only run 64-Bit ; 32-Bit is limited to ~2GB
• Real World Numbers on m1.large:
Selecting an Instance: Memory
• Estimate Necessary Working Set – db.runCommand( { serverStatus: 1, workingSet: 1 } )
Is pagesInMemory * 4k approaching total RAM? Is overSeconds decreasing / small?
– db.stats()
• Pick the Instance that Matches
• Monitor on MMS – Page Faults (abstract)
– Queues (better)
– Response Times (best)
Selecting an Instance: EBS Optimization
• Run EBS Optimized When Available – Especially with Provisioned IOPs
• Volume Config Impacts IO Perf Far More than
Instance Selection
Storage
• Instance Storage – Non-Durable
– Fast But Inconsistent Performance
– Can’t Use Snapshots for Backups
• “Standard” EBS – Slower
– Higher Variability Performance
• Provisioned IOPs EBS – Consistent Performance
– Don’t Under Provision -- Watch Queue Length
Storage
• RAID 10? Just use LVM on RAID 0 – More: http://blog.mongohq.com/debunking-myth-of-raid-10-as-
best-practice-on-aws/
• Use XFS or Ext4
• Mount with noatime, noexec, nodiratime
Selecting an Instance: Summary
1. Lead with Working Set Requirements
2. Validate Compute is Sufficient
3. Enable EBS Optimized if Available
4. Use Provisioned IOPS EBS
5. (Confirm Cost is Acceptable)
2. Deploying
It’s Easy. Let me show you.
Scaling Deployment
• DevOps: Go for ‘bilities: – Reliability, Predictability, Repeatability, and Auditability
• The Result is Easy Replaceability and Scalability – Build your infrastructure so it can be treated like an appliance
– The impact of your decisions during planning will be significantly mitigated
DevOps Tools
• AWS Marketplace AMIs – Preconfigured with MongoDB best practices
– Do-it-yourself scaling to ReplicaSets / Shards
– Helpful, but not a DevOps Solution
• AWS CloudFormation – Templates for Resource Setup & Initial Configuration
• Chef, Puppet, Ansible, SaltStack, & More – AWS OpsWorks, but limited by chef-solo
Security
• Run in a VPC – Complications: Cross Region, Multiple Source Ingress
• Use KeyFiles & Roles – KeyFiles: Internal authentication for cluster members
– Roles allow for user-level fine grain access control
• Advanced: – Keberos support in MongoDB 2.4
– SSL Support in Custom Builds & MongoDB Enterprise
3. Maintaining
Monitoring: MongoDB Monitoring Service
• Very Good, Free Holistic Monitoring – Important: ReplLag, Page Faults, Lock %
– Informative: OpCounters, Connections, Queue Lengths
• Includes Basic Alerting of Host Failures and Metric Thresholds
• Query Profiler Details Slow Queries – db.setProfilingLevel(1)
Monitoring: Amazon CloudWatch
• Detailed Resource Level Monitoring – Important: Queue Length, Read/Write Latencies
• Versatile alerting based on Amazon Simple Notification Service (SNS)
Backups
• Delayed Secondary – Questionable as a primary backup strategy
• Dump/Restore – Impractical for larger deployments
• MongoDB Service – Managed, Secure, Point in Time. Unclear suitability for larger deployments
– Expensive
• Snapshots – Fast, Easy, Scalable. Pay Attention to Consistency (RAID, Shards)
Easy Snapshot-Based Backups With Mongolly
• Automatic topology detection, snapshotting, and snapshot management for EBS-backed MongoDB Databases
• Easy as: $ mongolly backup
• https://github.com/msaffitz/mongolly
Conclusions
• MongoDB + AWS =
• Options For All Deployment / Workload Sizes – I/O typically the focal point for optimization
• Investing in a DevOps Strategy + Solution
Makes It Near Effortless
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
DAT209