mongodb versatility: scaling the mapmyfitness platform

21
MongoDB Versatility Scaling the MapMyFitness Platform Sept 14, 2012

Upload: mongodb

Post on 05-Dec-2014

3.709 views

Category:

Technology


0 download

DESCRIPTION

Chris Merz, Manager of Operations, MapMyFitness The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit its ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.

TRANSCRIPT

Page 1: MongoDB Versatility: Scaling the MapMyFitness Platform

MongoDB VersatilityScaling the MapMyFitness Platform

Sept 14, 2012

Page 2: MongoDB Versatility: Scaling the MapMyFitness Platform

Introduction

l MapMyFitness  founded  in  2007l Offices  in  Denver,  CO  &  AusRn,  TX

(w/  associates  in  SF,  Boston,  New  York,  LA,  and  Chicago)

l Over  11  million  registered  usersl ~60  million  geo-­‐data  routes  

(runs,  rides,  walks,  hikes,  etc)l Core  sites,  mobile  apps,  API,  white-­‐label

(MapMyRun,  MapMyRide,  MapMyWalk,  MapMyTri,  MapMyHike,  MapMyFitness,  MapMyRace)

Page 3: MongoDB Versatility: Scaling the MapMyFitness Platform

Platform Overview and Background

• Origins  in  the  LAMP  stack(Linux-­‐Apache-­‐MySQL-­‐PHP)

• Scaled  well  to  ~2  million  users• Redesigned  in  Python/Django• MySQL  backend  not  sufficient

“How  to  scale  from  2.5  to  6  million  users?”

Page 4: MongoDB Versatility: Scaling the MapMyFitness Platform

Functional Scaling

• IdenRfy  high-­‐growth  /  large-­‐data  collecRons• Must  be  able  to  live  outside  the  exisRng  

relaRonal  schema• Integrate  via  remote  resource  mapping  tables  

in  the  RDBMS• FuncRonal  Scaling  can  facilitate  movement  

towards  a  Service  Oriented  Architecture

Page 5: MongoDB Versatility: Scaling the MapMyFitness Platform

Use Case 1: Route Data Store

• Geo-­‐locaRon  data  stored  in  json  blocks• MySQL  →  S3  →  File  Server  →  MongoDB• IniRal  size  of  ~500GB,  ~18  million  objects• 3  member  replica  set• Dedicated  iron  servers  with  24GB  RAM

Page 6: MongoDB Versatility: Scaling the MapMyFitness Platform

Route presentation example (Lost in Seattle)

Page 7: MongoDB Versatility: Scaling the MapMyFitness Platform

Route Data Example

{

id: "e4da3b7fbbce2345d7772b0674a318d5",

updated_date: "2005-07-23 15:47:31",

city: "San Diego",

user_id: "4",

created_date: "2005-07-23 15:47:31",

route_name: "balboa park",

state: "CA",

total_distance: "3.09",

points: [

{lat: 32.7199629309, lng: -117.159318924, type: 1},

{lat: 32.7313715848, lng: -117.159404755, type: 1},

{lat: 32.7314437868, lng: -117.158031464, type: 1},

{lat: 32.7329600157, lng: -117.158074379, type: 1},

{lat: 32.7337903206, lng: -117.158589363, type: 1},

{lat: 32.7370392655, lng: -117.158589363, type: 1},

{lat: 32.7388802817, lng: -117.158074379, type: 1},

{lat: 32.7203239866, lng: -117.159147263, type: 1},...

]

};

Page 8: MongoDB Versatility: Scaling the MapMyFitness Platform

Solution Summary

MigraRon  PaSern:

• RESTful  API  modified  to  use  Mongo  PHP  driver  • Implemented  a  'pass  thru'  migraRon  funcRon• Batch  'backfill'  migraRons  via  pass-­‐thru• Data  transform  handled  in  PHP  code

Page 9: MongoDB Versatility: Scaling the MapMyFitness Platform

SAN storage and MongoDB

l Needed  to  quickly  expand  available  diskl Implemented  high-­‐end  SAN  subsysteml Impressive  i/o  performance  with  MongoDBl MigraRon  to  SAN  painless  thanks  to  OpLogl Easily  expandable  due  to  the  use  of  XFSl Over  100  million  objects,  ~7TB  of  data  

Page 10: MongoDB Versatility: Scaling the MapMyFitness Platform

“Gotchas” a.k.a. Lessons Learned

• Pay  aSenRon  to  potenRal  document  size(URlize  GridFS  for  larger  objects)

• Allocate  enough  RAM  for  indexes!  (Especially  important  for  Large  data  collecRons)

• File  dump  backups  may  not  scale  for  TB+  size  datasets.(URlize  delayed  and  'hidden'  member  for  DR)

• Evaluate  filesystem  choice  carefully  (hint:  xfs)

Page 11: MongoDB Versatility: Scaling the MapMyFitness Platform

Use Case 2: Django Session Store

• Django  sessions  not  scaling  in  MySQL• Modified  core  methods  to  use  MongoDB• Cutover  of  new  data  

(Test  for  Mongo  data,  fallback  to  MySQL)

• MigraRon  of  data  via  export/import(Simple  python  transform  script  using  pymongo)

Page 12: MongoDB Versatility: Scaling the MapMyFitness Platform

Use Case 3: Athletic Live Tracking

• Beta  feature  uRlized  TT  +  MySQL(did  not  scale  for  large  events)

• Required  to  be  “burstable”  for  Live  Events(deployable  in  'The  Cloud')

• Data  size  relaRvely  small  (compared  to  Routes  DB)

• “Live”  data,  no  archiving  required  

Page 13: MongoDB Versatility: Scaling the MapMyFitness Platform

Use Case 3: Athletic Live Tracking

• RS  Cloud,  3+n  MongoDB  replica  set  • Quickly  scalable  via  MongoDB  replicaRon• Highly  opRmized,  indexes  for  every  query• Low  administraRon  overhead  (vs  MySQL)

“Gotchas”l Know  your  applicaRon  

(tune  indexes  and  'find()'  ops  accordingly)l Know  your  driver

(python  pooling  driver  defaults  way  too  low)

Page 14: MongoDB Versatility: Scaling the MapMyFitness Platform

As a DBA: Ease of Administration

• ReplicaRon  made  elegant(as  compared  with  MySQL)

• Ridiculously  simple  to  add  add'l  members• Be  sure  to  run  IniRalSync  from  a  secondary

rs.add(  “host”  :  “livetrack_db09”,  “iniRalSync”  :  {  “state”  :  2  }  )

Page 15: MongoDB Versatility: Scaling the MapMyFitness Platform

Use Case 4: Micro-Messaging Framework

• IniRal  use  case  providing  'micro-­‐goals'  (user-­‐defined  stats  aggregaRon)

• MongoDB  for  persistence  of  aggregates• Python  server  +  RabbitMQ  (AMQP)• Implemented  between  Django  and  MySQL

(service  subscribes  to  'interesRng'  stats)• Horizontally  scalable  into  the  cloud,  with  base  

capacity  on  dedicated  iron• Messaging  system  expanded  to  handle  real-­‐Rme  

course  analysis  and  push  noRficaRons  

Page 16: MongoDB Versatility: Scaling the MapMyFitness Platform

Indexing Patterns or “Know Your App”

• Proper  indexing  criRcal  to  performance  at  scale• MongoDB  is  ulRmately  flexible,  being  schemaless

(mongo  gives  you  enough  rope  to  hang  yourself)• Avoid  un-­‐indexed  queries  at  all  costs

(no.    really.    quickest  way  to  crater  your  app)• Onus  on  DevOps  to  match  applicaRon  to  indexes

(know  your  query  profile,  never  assume)• Shoot  for  'covered  queries'  wherever  possible

(answer  can  be  obtained  from  indexes  only)

Page 17: MongoDB Versatility: Scaling the MapMyFitness Platform

Use Case 5: API Logging DB

• MongoDB  is  great  for  logging  (especially  if  you  log  in  json  format!)

• Good  applicaRon  for  capped  collecRons(cap  by  data  size,  or  TTL)

• Running  with  'safe  mode'  off  for  speed(fire-­‐n-­‐forget  logging  can  reduce  latency)

• Cloud  servers  are  a  good  fit  for  logging  apps

Page 18: MongoDB Versatility: Scaling the MapMyFitness Platform

Capped Collections

• Used  for  retaining  a  fixed  amount  of  data(based  on  data  size,  not  number  of  rows)

• URlizes  FIFO  method  for  pruning  collecRon(Especially  useful  for  data  that  devalues  with  age)

• TTL  CollecRons  (2.2)  age  out  data  based  on  a  retenRon  date  limit  (useful  for  a  variety  of  data  types)

Gotcha!

Explicitly  create  the  capped  collecRonbefore  any  data  is  put  into  the  system  to  avoid  auto-­‐creaRon  of  collecRon

Page 19: MongoDB Versatility: Scaling the MapMyFitness Platform

Monitoring MongoDB at MMF

• Monitor  for  real-­‐Rme  system  events(Faster  response  Rme  =  less  impact)

• Track  historical  performance  data  trends(Useful  for  predicRve  failure  analysis  and  scaling  need  projecRons)

• MMS  –  MongoDB  Monitoring  Service    (Now  our  default  visual  metrics  system)

• Zabbix  open  source  monitoring  • Makoomi  Zabbix  plugins  for  MongoDB• Mongostat  –  realRme  troubleshooRng  godsend  

Page 20: MongoDB Versatility: Scaling the MapMyFitness Platform

Conclusion

• MongoDB  is  extremely  versaRle,  and  can  help  your  applicaRon  scale,  even  if  you  don't  design  your  app  with  MongoDB  from  the  start.

• MongoDB  fits  well  into  both  dedicated  and  virtual  architecture  environments.

• Low  maintenance  overhead  compared  to  tradiRonal  RDMBS.

• Provides  the  horizontal  scaling  path  required  for  Internet  Sized  applicaRons.

Page 21: MongoDB Versatility: Scaling the MapMyFitness Platform

We're  Hiring!

hSp://www.mapmyfitness.com/careers

mongo-­‐[email protected]