Download - MongoATL: How Sourceforge is Using MongoDB
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 1
How SourceForge is Using MongoDB
Rick Copeland@rick446
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 2
SF.net “BlackOps”: FossFor.us
User Editable!
Web 2.0!(ish)
Not Ugly!
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 3
Moving to NoSQL FossFor.us used CouchDB (NoSQL) “Just adding new fields was trivial, and was
happening all the time” – Mark Ramm Scaling up to the level of SF.net needs
research CouchDB MongoDB Tokyo Cabinet/Tyrant Cassandra... and others
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 4
Rewriting “Consume” Most traffic on SF.net hits 3 types of pages:
Project Summary File Browser Download
Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net
Original goal is 1 MongoDB document per project Later split release data because some projects have lots of releases
Periodic updates via RSS and AMQP from “Develop”
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 5
Deployment ArchitectureLoad Balancer / Proxy
Master DB Server
MongoDBMaster
Apachemod_wsgi / TG 2.0
MongoDBSlave
Apachemod_wsgi / TG 2.0
MongoDBSlave
Apachemod_wsgi / TG 2.0
MongoDBSlave
Gobble Server
Develop
Apachemod_wsgi / TG 2.0
MongoDBSlave
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 6
Deployment Architecture (revised)
Load Balancer / Proxy
Master DB Server
MongoDBMaster
Apachemod_wsgi / TG 2.0
Gobble Server
DevelopApachemod_wsgi / TG 2.0
Apachemod_wsgi / TG 2.0
Apachemod_wsgi / TG 2.0
Scalability is good
Single-node performance is
good, too
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 7
SF.net Downloads Allow non-sf.net projects to use SourceForge mirror network
Stats calculated in Hadoop and stored/served from MongoDB
Same deployment architecture as Consume (4 web, 1 db)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 8
Allura (SF.net “beta” devtools)
Rewrite developer tools with new architecture
Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come
Single MongoDB replica set manually sharded by project
Release early & often
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 9
What We Liked Performance, performance, performance – Easily handle
90% of SF.net traffic from 1 DB server, 4 web servers
Schemaless server allows fast schema evolution in development, making many migrations unnecessary
Replication is easy, making scalability and backups easy Keep a “backup slave” running
Kill backup slave, copy off database, bring back up the slave
Automatic re-sync with master
Query Language You mean I can have performance without map-reduce?
GridFS
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 10
Pitfalls Too-large documents
Store less per document Return only a few fields
Ignoring indexing Watch your server log; bad queries show up
there Ignoring your data’s schema Using many databases when one will do Using too many queries
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 11
Ming – an “Object-Document
Mapper?” Your data has a schema Your database can define and enforce it
It can live in your application (as with MongoDB)
Nice to have the schema defined in one place in the code
Sometimes you need a “migration” Changing the structure/meaning of fields
Adding indexes
Sometimes lazy, sometimes eager
Queuing up all your updates can be handy
Python dicts are nice; objects are nicer
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 12
Ming Concepts Inspired by SQLAlchemy
Group of classes to which you map your collections
Each class defines its schema, including indexes
Convenience methods for loading/saving objects and ensuring indexes are created
Migrations
Unit of Work – great for web applications
MIM – “Mongo in Memory” nice for unit tests
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 13
Ming Example
from ming import schemafrom ming.orm import MappedClassfrom ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty)
class WikiPage(MappedClass):
class __mongometa__: session = session name = 'wiki_page'
_id = FieldProperty(schema.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments=RelationProperty('WikiComment')
MappedClass.compile_all() # Lets ming know about the mapping
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 14
Open Source
Minghttp://sf.net/projects/merciless/
MIT License
Allurahttp://sf.net/p/allura/
Apache License
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 15
Future Work
mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatConfidential Geeknet, page 16
Questions?
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 17
Rick Copeland@rick446