engineering scalable social games dr. robert zubek
TRANSCRIPT
![Page 1: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/1.jpg)
![Page 2: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/2.jpg)
Engineering ScalableSocial Games
Dr. Robert Zubek
![Page 3: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/3.jpg)
Top social game developer
![Page 4: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/4.jpg)
Growth Example
Roller Coaster Kingdom
1M DAUs over a weekend
source: developeranalytics.com
![Page 5: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/5.jpg)
Growth Example
FishVille
6M DAUs in its first week
source: developeranalytics.com
![Page 6: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/6.jpg)
Growth Example
FarmVille
25M DAUs over five months
source: developeranalytics.com
![Page 7: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/7.jpg)
Talk Overview
Part I. Game Architectures
Part II. Scaling Solutions
Introduce game developers to best
practices for large-scale web development
![Page 8: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/8.jpg)
• server approaches• client approaches
Part I. Game Architectures
![Page 9: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/9.jpg)
Three Major Types
Web server stack Web + MMO stack
HTML FlashFlash
![Page 10: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/10.jpg)
• Game logic in MMO server (eg. Java)
• Web stack for everything else
Server Side
• Usually based on a LAMP stack
• Game logic in PHP
• HTTP communication
Mixed stackWeb stack
![Page 11: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/11.jpg)
Example 1
FishVille
Web Servers + CDN
Cache and Queue Servers
DB Servers
![Page 12: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/12.jpg)
Example 2
YoVille
Web Servers + CDN
MMO Servers
Cache and Queue Servers
DB Servers
![Page 13: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/13.jpg)
• HTTP scales very well• Stateless request/response• Easy to load-balance across servers• Easy to add more servers
Some limitations• Server-initiated actions• Storing state between requests
Why web stack?
![Page 14: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/14.jpg)
caches
example
Web stack and state
Web #1
DB Servers
Web #2
Web #3Client
Web #4
xxx
![Page 15: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/15.jpg)
MMO servers
• Minimally “MMO”:• Persistent socket connection per client• Live game support: chat, live events• Supports data push• Keeps game state in memory
• Very different from web!
![Page 16: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/16.jpg)
Why MMO servers?
• Harder to scale out!• But on the plus side:
1. Supports live communication and events
2. Game logic can run on the server, independent of client’s actions
3. Game state can be entirely in RAM
![Page 17: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/17.jpg)
Example Web + MMO stack
Web Servers
MMO Servers
Client
caches
DB Servers
![Page 18: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/18.jpg)
• The game is “just” a web page
• Minimal sys reqs
• Limited graphics and animation
Client Side
• High production quality games
• Game logic on client
• Can keep open socket
HTML + AJAXFlash
![Page 19: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/19.jpg)
Social Network Integration
• “Easy” because not related to scaling
• Calling the host network to get a list of friends, post something, etc.
• Networks provide REST APIs, and sometimes client libraries
![Page 20: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/20.jpg)
Architectures
Data
Server
Client
database, cache, etc.
Web server stack
HTML Flash
Web + MMO
Flash
![Page 21: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/21.jpg)
Part I. Game Architectures
Part II. Scaling Solutions
Introduce game developers to best
practices for large-scale web development
not blowing upas you grow
![Page 22: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/22.jpg)
ScalingScale up vs. scale out
I need more boxes
I need a bigger box
![Page 23: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/23.jpg)
ScalingScale up vs. scale out
Scale out has been a clear win• At some point you can’t
get a box big enough, quickly enough
• Much easier to add more boxes
• But: requires architectural support from the app!
![Page 24: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/24.jpg)
Example:
Scaling up vs. out
• 500,000 daily players in first week
• Started with just one DB, which quickly became a bottleneck
• Short term: scaled up, optimized DB accesses, fixed slow queries, caching, etc.
• Long term: must scale out with the size of the player base!
![Page 25: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/25.jpg)
ScalingWe’ll look at four elements:
Databases
Game servers (web + MMO)
Caching
Capacity planning
![Page 26: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/26.jpg)
II A. Databases
App Servers
App Servers
Client
caches
queues
DBDB
DBs are the first to fall over
![Page 27: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/27.jpg)
App
W R
![Page 28: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/28.jpg)
App
W R
M S S
App
![Page 29: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/29.jpg)
M S S
App
M
App
M
![Page 30: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/30.jpg)
M
App
MM0
App
M1
![Page 31: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/31.jpg)
How to partition data?
• Two most common ways:• Vertical – by table
• Easy but doesn’t scale with DAUs• Horizontal – by row
• Harder to do, but gives best results!• Different rows live on different DBs• Need a good mapping from row # to DB
![Page 32: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/32.jpg)
id Data
100 foo
101 bar
102 baz
103 xyzzy
Row striping
M0 M1• Row-to-shard mapping:• Primary key modulo # of DBs• Like a “logical RAID 0”• More clever schemes exist, of course
![Page 33: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/33.jpg)
S0 S1M2 M3
x x
Scaling out your shards
M0
ServersServersServers
M1
![Page 34: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/34.jpg)
Sharding in a nutshell
• It’s just data striping across databases (“logical RAID 0”)• There’s no automatic replication, etc.• No magic about how it works• That’s also what makes it robust,
easy to set up and maintain!
![Page 35: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/35.jpg)
Example:
Sharding
• Biggest challenge: designing how to partition huge existing dataset
• Partitioned both horizontally and vertically
• Lots of joins had to be broken
• Data patterns needed redesigned • With sharding, you need to know the shard
ID for each query!
• Data replication had difficulty catching up with high-volume usage
![Page 36: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/36.jpg)
Sharding surprises
• Can’t do joins across shards
• Instead, do multiple selects, or denormalize your data
Be careful with joins
![Page 37: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/37.jpg)
Sharding surprises
• CPU-expensive; easier to pay this cost in the application layer (just be careful)
• The more you keep in RAM, the less you’ll need these
Skip transactions and foreign key constraints
![Page 38: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/38.jpg)
II B. Caching
• DBs can be too slow• Spend memory to buy speed!
caches
queues
caches
queues
App Servers
App Servers
ClientDB
![Page 39: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/39.jpg)
Caching
• Two main uses:• Speed up DB access to commonly
used data• Shared game state storage for stateless
web game servers
• Memcached: network-attached RAM cache
![Page 40: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/40.jpg)
Memcache
• Stores simple key-value pairs• eg. “uid_123” => {name:”Robert”, …}
• What to put there?• Structured game data• Mutexes for actions across servers
• Caveat• It’s an LRU cache, not a DB!
No persistence!
![Page 41: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/41.jpg)
Memcache scaling out
DBs MCs App
Shard it just like the DB!
![Page 42: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/42.jpg)
II C. Game Servers
Web Servers
MMO Servers
Client
caches
queues
DBWeb
Servers
MMO Servers
![Page 43: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/43.jpg)
LB
LB
DNSLB
Scaling web servers
LB
![Page 44: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/44.jpg)
Scaling content delivery
CDN
game dataonly
mediafiles
mediafiles
Web Servers
Client
![Page 45: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/45.jpg)
Scaling MMO servers
• Scaling out is easiest when servers don’t need to talk to each other• Databases: sharding• Memcache: sharding• Web: load-balanced farms• MMO: ???
• Our MMO server approach: shard just like other servers
![Page 46: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/46.jpg)
Scaling MMO servers
• Keep all servers separate from each other• Minimize inter-server communication• All participants in a live event should be on
the same server• Scaling out means no direct sharing
• Sharing through third parties is okay
![Page 47: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/47.jpg)
Problem #1:
Load balancing • What if a server gets overloaded?
• Can’t move a live player to a different server
• Poor man’s load-balancing:• Players don’t pick servers; the LB does• If a server gets hot, remove it from the
LB pool• If enough servers get hot, add more server
instances and send new connections there
![Page 48: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/48.jpg)
Problem #2:
Deployment
Downtime = lost revenues• You can measure cost of
downtime in $/minute!
![Page 49: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/49.jpg)
Problem #2:
Deployment
Deployment comparison:• Web: just copy over PHP files• Socket servers – much harder
• How to deploy with zero downtime?
• Ideally: set up shadow new server and slowly transition players over
![Page 50: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/50.jpg)
II D. Capacity planning
![Page 51: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/51.jpg)
Capacity
We believein scaling out
• Demand could change rapidly• How to provision enough servers?
Different logistics thanscaling up
![Page 52: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/52.jpg)
Capacity:
colo vs. cloud
• Low or zero fixed cost
• Higher variable cost
• Little or no lead time on additional capacity
• Less control over ops
• Limited choice of VMs
• Virtualized storage
• Can’t scale up! Only out.
CloudColo
• Higher fixed cost
• Lower variable cost
• Significant lead time on additional capacity
• More control over ops
• Any chips you want
• Any disks you want
• Scale up or out
![Page 53: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/53.jpg)
Operations
• Scaling out: a legion of servers
• You’ll need tools• App health – custom dashboard• Server monitoring – Munin• Automatic alerts – Nagios
![Page 54: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/54.jpg)
Ops: Dashboard
![Page 55: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/55.jpg)
Ops: Munin
Server stats from the entire farm
![Page 56: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/56.jpg)
Ops: Nagios
• SMS alerts for your server stats• Connections drop to zero• Or CPU load exceeds 4• Or test account can’t connect, etc.
• Put alerts on business stats, too!• DAUs dropping below daily avg, etc.
![Page 57: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/57.jpg)
Server failures
• Common in the cloud• Dropping off the net• Sometimes even shutting down
• Be defensive• Ops should be prepared• Devs should program defensively
![Page 58: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/58.jpg)
The EndMany thanks to those who taught me a lot:
Virtual Worlds: Scott Dale, Justin WaldronCoaster: Biren Gandhi, Nathan Brown, Tatung Mei
http://robert.zubek.net/gdc2010
![Page 59: Engineering Scalable Social Games Dr. Robert Zubek](https://reader037.vdocument.in/reader037/viewer/2022103007/56649ddf5503460f94ad996f/html5/thumbnails/59.jpg)