the case for using mongodb in social game - animal land
DESCRIPTION
Presentation for Mongo Tokyo 2012. Talking about Animal Land, which uses MongoDB for the primary database.TRANSCRIPT
The Case for using MongoDB in Social Game
Masakazu MatsushitaCyberagent, Inc.
•Masakazu Matsushita•@matsukaz•Cyberagent, Inc. - SocialApps Division- Ameba Pico (2011/01~)- Animal Land (2011/03~)
•DevLOVE Staff
About Me
•About Animal Land• System Overview•Why MongoDB ?•Considered Points• Troubles•Current Problems
Agenda
AboutAnimal Land
Demo
http://apps.facebook.com/animal-land/
• Started on 2011/03• First meeting was held on 3.11•Released on 2011/12/09
Development Period
• Producer × 2•Designer × 1• Flash Developer × 3• Engineer × 4 + α
Team Member
SystemOverview
•Amazon Web Services- EC2 (Instance Store + EBS)- S3- CloudFront- Route53- Elastic Load Balancing
•Google Apps• Kontagent
Using Cloud Services
System Relations
HTML
Flash
iframe
CommandServer
WebServer
JSON
AMF
API Call Callback forPayment API
AMF : Actionscript Message Format
Servers
Shard
Web
ELB
nginxTomcatmongos
CommandnginxTomcatmongos
×3 ×4
MongoDB
ELBS3
CloudFront
Route 53
admin
Tomcatmongos
monitor
ビルド
jenkinsMaven
バッチバッチMySQL
L
×5
mongod
MongoDBmongod
MongoDBmongod
MongoDBmongoc
×3
MySQLMySQL
MySQLMySQL
memcachedmemcached
×2XX
XX
XX
L
L
L
L
L
L
L
L
muninnagios
nginx
L
XX
m1.largem2.2xlarge
XX
redmine
SVNEBS EBS
EBS
Secondary EBS
EBS EBS
EBS
• nginx 1.0.x• Tomcat 7.0.x•MongoDB 2.0.1•MySQL 5.5.x•memcached 1.4.x
Middleware
• Spring Framework、Spring MVC•BlazeDS、Spring Flex• Spring Data - MongoDB 1.0.0 M4•mongo-java-driver 2.6.5• Ehcache• spymemcached•RestFB•MyBatis
Framework / Libraries
Why MongoDB ?
•Used in Ameba Pico
http://apps.facebook.com/amebapico/
• I like MongoDB features !!- ReplicaSet- Auto-Sharding- Can handle complex data structures- Index, Advanced Queries- Developed actively- Quite easy to understand
• Fits in Animal Land requirements- Complex data structures (City’s grid data, user/structure parameters, ...)
-Most processes run sequencially and doesn’t update same data at a time
-Maintenance-less•Change data structures dynamically
•Durability, Scalability
•Resolve MongoDB’s weak points by other way- Some data are stored in other DB• Payments history needs reliability, so they are stored in MySQL
• Temporary data are stored in memcached
- Don’t think of using transactions
ConsideredPoints
•Define data without using transaction- User data are defined in 1 document and updated only once in block
{ “facebookId” : xxx, “status” : { “lv” : 10, “coin” : 9999, ... }, “layerInfo” : “1|1|0|1|2|1|1|3|1|1|4|0...”, “structure” : { “1|1” : { “id” : xxxx, “depth” : 3, “width” : 3, “aspect” : 1, ... }, ... }, “inventory” : [ { “id” : xxx, “size” : xxx, ... }, ... ], “neighbor” : [ { “id” : xxx, “newFlg” : false, ... }, ... ], “animal” : [ { “id” : xxx, “color” : 0, “pos” : “20|20”, ... } ], ...}
User DataImage
Developing Application
•Cutdown data size as possible- City’s grid data are stored in 1 field (Expanded when application uses)
- Data in design phase(500 KB)“layerInfo” : { “1|1” : 0, “1|2” : 1, ....}
“layerInfo” : “1|1|0|1|2|1|1|3|1|1|4|0...”
- Current data(50 KB)
Developing Application
•Careful not to define too many fields- It took more than 5 sec for find() when data contained more than 20,000 fields (144x144 City’s grid data)
- Consider data size, and also for the time to parse BSON
Developing Application
• Shard Key is decided in following policy- Don’t use low-cardinality key- Recent used data should be on memory, and data not used should not be
- Frequently read/write data should be in each shard in nice balance
Developing Application
•Use targeted operation as possible- Use Shard Key- Use index for non Shard Key op
Operation Typedb.foo.find({ShardKey : 1}) Targeteddb.foo.find({ShardKey : 1, NonShardKey : 1}) Targeteddb.foo.find({NonShardKey : 1}) Globaldb.foo.find() Globaldb.foo.insert(<object>) Targeteddb.foo.update({ShardKey : 1}, <object>)db.foo.remove({ShardKey : 1})
Targeted
db.foo.update({NonShardKey : 1}, <object>)db.foo.remove({NonShardKey : 1})
Global
Developing Application
•Decrease update frequency- Store master data in memory- Queue mechanism in Flash- User operations are processed in block (once per 5 sec)
• Server processes sequencially
Flash CommandServer
Queue
Stored in Queue
Processes SequenciallyUser op.
Developing Application
Send in Blockonce per 5 sec
•Develop efficiently using O/R Mapper- Spring Data - MongoDB and Wrapper classes@Autowiredprotected MongoTemplate mongoTemplate;
public void insert(T entity) { mongoTemplate.save(entity);}
- Can use any serializable object-Maybe easier to deal with than RDB O/R Mapper
Developing Application
• Implements without expecting rollback- Give up for some inconsistent data- Careful not to inflict disadvantages to user
Developing Application
• Estimate same as other DBs- Data size/user (50 KB)- Expected user (DAU 70万)- Update frequency- Each servers max connections- Number of servers, specs, costs- Consider when to scale servers according to user growth
Constructing Infrastructure
• Performance Verification- Bandwith (EC2 is about 350Mbps)- Verify MongoDB (MongoDB 2.0.1、2Shard with Replica Set、enable journaling)•READ/WRITE performance• Performance during migration• Performance through Java Driver
- Performance through application
Constructing Infrastructure
• Prepare for troubles-When mongod dies ...• SECONDARY becomes PRIMARY ?•Data synchronize when it restart ?• Safe when SECONDARY dies ?
-When mongoc dies ...•Nothing happens ?
- Succeed to restore from backup ?→ no problem
Constructing Infrastructure
•ReplicaSet and Shard Construction- Use large memory size- Use high speed I/O disk- Place mongoc independent from mongod
- Use EBS to raise reliability (run only as SECONDARY)
• Enable Journaling• Set oplogsize as 10GB
Constructing Infrastructure
•Create index in advance- Index blocks all operation ...- Took about 2 min to create index for 200,000 data (each about 50KB)
- Create index during maintanance time or in background for running services
Constructing Infrastructure
•Connection Pool Tuning
property description valueconnectionsPerHost number of connections 100threadsAllowedToBlockForConnectionMultiplier
wait threads per connection 4
※ connnection pool is for mongos
- set the value with nginx worker and Tomcat thread size in mind
- Careful with “Out of Semaphores” errors. Above case would be
100 + (100 * 4) = 500
Constructing Infrastructure
•Check chunk balance routinely-Move chunk manually if needed
•Careful when adding new Collections- Primary Shard might be overloaded because new collection data will be placed there
In Operation
Troubles
•Caused by virtual server host for EC2 instance. Nothing to do with MongoDB
• There was NO impact for service !!
•Recovered just starting mongod and mongoc processes
mongod & mongoc died..
CurrentProblems
•Difficult to backup for running services•Backup while in maintenance mode• Thinking of doing backup from SECONDARY as follows, but need some verification ...1. set balancer off2. write lock SECONDARY and backup3. set balance on
Online Backup
•Upgrade is also difficult for the timing•Need to do following steps1. Stop Arbiter, upgrade, run2. Stop SECONDARY, upgrade, run3. Stop PRIMARY, upgrade, run
Upgrade
• It’s possible to add shards later on, but have to consider it’s performance for balancing- Pico is adding shards while in maintenance mode for safety
Adding Shards
•Migration frequently occurs, because user data was too large and chunksize was small (default 64MB)
•Need to adjust chunksize according to it’s data size
Maybe nice if you can set the chunksize for each collections ...
Best Chunksize
•Hard to analyze user data, because we handle the data in 1 document
•Creating index if necessary- Tested MapReduce, but some performance problems occured. Not using it right now
Analyzing Data
Thank you !Please enjoy Animal Land !!
http://apps.facebook.com/animal-land/