mongodb versatility: scaling the mapmyfitness platform
DESCRIPTION
Chris Merz, Manager of Operations, MapMyFitness The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit its ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.TRANSCRIPT
MongoDB VersatilityScaling the MapMyFitness Platform
Sept 14, 2012
Introduction
l MapMyFitness founded in 2007l Offices in Denver, CO & AusRn, TX
(w/ associates in SF, Boston, New York, LA, and Chicago)
l Over 11 million registered usersl ~60 million geo-‐data routes
(runs, rides, walks, hikes, etc)l Core sites, mobile apps, API, white-‐label
(MapMyRun, MapMyRide, MapMyWalk, MapMyTri, MapMyHike, MapMyFitness, MapMyRace)
Platform Overview and Background
• Origins in the LAMP stack(Linux-‐Apache-‐MySQL-‐PHP)
• Scaled well to ~2 million users• Redesigned in Python/Django• MySQL backend not sufficient
“How to scale from 2.5 to 6 million users?”
Functional Scaling
• IdenRfy high-‐growth / large-‐data collecRons• Must be able to live outside the exisRng
relaRonal schema• Integrate via remote resource mapping tables
in the RDBMS• FuncRonal Scaling can facilitate movement
towards a Service Oriented Architecture
Use Case 1: Route Data Store
• Geo-‐locaRon data stored in json blocks• MySQL → S3 → File Server → MongoDB• IniRal size of ~500GB, ~18 million objects• 3 member replica set• Dedicated iron servers with 24GB RAM
Route presentation example (Lost in Seattle)
Route Data Example
{
id: "e4da3b7fbbce2345d7772b0674a318d5",
updated_date: "2005-07-23 15:47:31",
city: "San Diego",
user_id: "4",
created_date: "2005-07-23 15:47:31",
route_name: "balboa park",
state: "CA",
total_distance: "3.09",
points: [
{lat: 32.7199629309, lng: -117.159318924, type: 1},
{lat: 32.7313715848, lng: -117.159404755, type: 1},
{lat: 32.7314437868, lng: -117.158031464, type: 1},
{lat: 32.7329600157, lng: -117.158074379, type: 1},
{lat: 32.7337903206, lng: -117.158589363, type: 1},
{lat: 32.7370392655, lng: -117.158589363, type: 1},
{lat: 32.7388802817, lng: -117.158074379, type: 1},
{lat: 32.7203239866, lng: -117.159147263, type: 1},...
]
};
Solution Summary
MigraRon PaSern:
• RESTful API modified to use Mongo PHP driver • Implemented a 'pass thru' migraRon funcRon• Batch 'backfill' migraRons via pass-‐thru• Data transform handled in PHP code
SAN storage and MongoDB
l Needed to quickly expand available diskl Implemented high-‐end SAN subsysteml Impressive i/o performance with MongoDBl MigraRon to SAN painless thanks to OpLogl Easily expandable due to the use of XFSl Over 100 million objects, ~7TB of data
“Gotchas” a.k.a. Lessons Learned
• Pay aSenRon to potenRal document size(URlize GridFS for larger objects)
• Allocate enough RAM for indexes! (Especially important for Large data collecRons)
• File dump backups may not scale for TB+ size datasets.(URlize delayed and 'hidden' member for DR)
• Evaluate filesystem choice carefully (hint: xfs)
Use Case 2: Django Session Store
• Django sessions not scaling in MySQL• Modified core methods to use MongoDB• Cutover of new data
(Test for Mongo data, fallback to MySQL)
• MigraRon of data via export/import(Simple python transform script using pymongo)
Use Case 3: Athletic Live Tracking
• Beta feature uRlized TT + MySQL(did not scale for large events)
• Required to be “burstable” for Live Events(deployable in 'The Cloud')
• Data size relaRvely small (compared to Routes DB)
• “Live” data, no archiving required
Use Case 3: Athletic Live Tracking
• RS Cloud, 3+n MongoDB replica set • Quickly scalable via MongoDB replicaRon• Highly opRmized, indexes for every query• Low administraRon overhead (vs MySQL)
“Gotchas”l Know your applicaRon
(tune indexes and 'find()' ops accordingly)l Know your driver
(python pooling driver defaults way too low)
As a DBA: Ease of Administration
• ReplicaRon made elegant(as compared with MySQL)
• Ridiculously simple to add add'l members• Be sure to run IniRalSync from a secondary
rs.add( “host” : “livetrack_db09”, “iniRalSync” : { “state” : 2 } )
Use Case 4: Micro-Messaging Framework
• IniRal use case providing 'micro-‐goals' (user-‐defined stats aggregaRon)
• MongoDB for persistence of aggregates• Python server + RabbitMQ (AMQP)• Implemented between Django and MySQL
(service subscribes to 'interesRng' stats)• Horizontally scalable into the cloud, with base
capacity on dedicated iron• Messaging system expanded to handle real-‐Rme
course analysis and push noRficaRons
Indexing Patterns or “Know Your App”
• Proper indexing criRcal to performance at scale• MongoDB is ulRmately flexible, being schemaless
(mongo gives you enough rope to hang yourself)• Avoid un-‐indexed queries at all costs
(no. really. quickest way to crater your app)• Onus on DevOps to match applicaRon to indexes
(know your query profile, never assume)• Shoot for 'covered queries' wherever possible
(answer can be obtained from indexes only)
Use Case 5: API Logging DB
• MongoDB is great for logging (especially if you log in json format!)
• Good applicaRon for capped collecRons(cap by data size, or TTL)
• Running with 'safe mode' off for speed(fire-‐n-‐forget logging can reduce latency)
• Cloud servers are a good fit for logging apps
Capped Collections
• Used for retaining a fixed amount of data(based on data size, not number of rows)
• URlizes FIFO method for pruning collecRon(Especially useful for data that devalues with age)
• TTL CollecRons (2.2) age out data based on a retenRon date limit (useful for a variety of data types)
Gotcha!
Explicitly create the capped collecRonbefore any data is put into the system to avoid auto-‐creaRon of collecRon
Monitoring MongoDB at MMF
• Monitor for real-‐Rme system events(Faster response Rme = less impact)
• Track historical performance data trends(Useful for predicRve failure analysis and scaling need projecRons)
• MMS – MongoDB Monitoring Service (Now our default visual metrics system)
• Zabbix open source monitoring • Makoomi Zabbix plugins for MongoDB• Mongostat – realRme troubleshooRng godsend
Conclusion
• MongoDB is extremely versaRle, and can help your applicaRon scale, even if you don't design your app with MongoDB from the start.
• MongoDB fits well into both dedicated and virtual architecture environments.
• Low maintenance overhead compared to tradiRonal RDMBS.
• Provides the horizontal scaling path required for Internet Sized applicaRons.