can your mobile infrastructure survive 1 million concurrent users?
TRANSCRIPT
CAN YOUR MOBILE INFRASTRUCTURE SURVIVE
1 MILLION CONCURRENT USERS?
Melissa BenuaSiva KatirPlayFab, IncMobile Dev + Test 2016
Don’t be your own worst enemy!
The Simpsons: Tapped Out launched by EA in 2012Backend was so unprepared for massive loads of traffic it was pulled for FIVE months for total redesignWent on to become a huge and long-lasting hit in the market for many years afterwards
Can your company afford to add an extra 5 months to the development cycle? Including lost marketing and promotional spend? Including lost mindshare? Including bad press?
Be your own guardian angel!
Loadout launched on Steam by Edge of Reality500x increase in players overnight on being featured in Steam storeEC2 auto-scaled in atomic and replaceable servers instantly to handle load No downtime, no panic, no fires
DO YOU EVEN NEED A BACKEND?Maybe! Maybe not!
What can my backend do for me?
Push updates without going through full certification process• New artwork? No problem!• Message of the day!• In-app purchase promotions!
Improve customer service• Have an authoritative source for
what a client ‘has’• Direct access to grant
entitlements to remediate issues
What can my backend do for me?
Support a single user across multiple devices• Recover a user’s session even if
they lose or replace their device• Continue the same session across
multiple devices (phone to tablet)Perform ‘trusted’ transactions (especially around receipt verification)• Clients are untrustworthy!• Client-to-Provider transaction can
only say if a receipt is valid, NOT if a receipt is valid for your app
Know Your Project
What is your budget?• What does it cost to host?• What does it cost to run?
Who are your engineers?• Do you have the in-house
expertise to manage all services?• DevOps? Backend? Whole-Stack?
Front-End?• Are they willing to be on-call
24x7?What do you need to put in the cloud? Why?
Know Your Data
What data are you storing?• User data• Group data• Application data
How does each piece of data need to be queried?• Can all data be looked up by a key?• Need to do arbitrary field queries?
Is the data read and/or write heavy?How much data do you expect to store per user?
BUILDING A BACKEND 101Not taught in schools!
Pick a Cloud Provider
Is your language well supported in your provider?How much self management is required for each service?How well is scalability built in?Do you have region requirements?• European data protection laws• Russia and China have special
data laws
Large Needs or Small Needs?
Database + basic CRUD APIs? • AWS Lambda!
Complex data + user management?• AWS Mobile or Azure Mobile
Services!Highly custom requirements?• Roll your own on a public cloud
(PROCEED WITH CAUTION!)
Storing and Retrieving Data
Know your databases strength• MySQL – Very easy to get started with and
widely supported• MS-SQL – Powerful query engine and
incredibly performant• MongoDB – Can query against arbitrary
fields• DynamoDB – Very easy scaling and fast
random accessKnow their weaknesses too• MySQL – very hard to scale• MS-SQL – still pretty hard to scale• MongoDB – very hard to scale correctly and
maintain data integrity• DynamoDB – can only query against
predefined indexes cost effectively
Storing and Retrieving Data
Novel solutions to database shortcomings• Use multiple databases to take advantage
of their individual strengths• Example: Store “index” data in SQL, while
using DynamoDB for actual data storage which clients use
• Allows you to store all data without needing to scale a difficult to scale database
Keys:• Have a way to reliably update the SQL
database out of the user’s flow• Don’t treat the SQL store as authoritative• Some tools can make this entirely
seamless, such as using DynamoDB write streams and Lambda to update SQL through
SQL:{
“playerId”: 00001“purchaseId”: 1002092,“purchaseValue”: 0.99,“purchaseDate”: 03/01/2016 09:01:05
}
DyanamoDB:{
“playerId”: 00001, “purchaseId”: 1002092, “purchasedItems”:[{“itemName”: ”in_app_1”, “purchasePrice”: 0.99 }]
}
SELECT purchaseId, purchaseValue FROM sqlPurchaseTable WHERE purchaseDate > 3/1/2016
Plan For Failure
Design for the worst, hope for the best• Any machine can go down at any time• No machine should be ‘special’
If any machine can go down then any machine can also be brought upArchitect-in failure behavior both up and down the stack• DB times out?• Web server disk fails?• Third-party provider goes down? http://gunshowcomic.com/648
COMMON PITFALLSIt’s a trap!
Saving Data
Remote != LocalDo:• Save only changed data• Save data in batches• Prepare for connection failures• Prepare for client failures• Prepare for server failures
Don’t:• Save on a timer (unless it’s retrying)• Save duplicated data• Expect it to work• Make assumptions on if it worked
http://cloudtweaks.com/
Loading Data
Easy Wins• Client:• Pre-load data during idle times• Cache locally• Assume data can fail to be loaded• Assume data can arrive corrupted or out of
order• Assume it will load slow• If security matters, connect via SSL• Don’t connect directly to the data store
• Server:• Cache data that is OK to serve stale• Design data schemas to make each request
perform as few queries as possible• Design authorization in such a way to prevent
any, or at least limit any extra queries
Easy Fails• Trying to implement a custom SSL
service• Trying to be clever with caching• Assuming anything will work on the first
try
Scalability
Don’t optimize early• Actually know what your
bottlenecks are; most likely it is NOT string handling!
• Run a realistic load test with a profiler to get actual useful data
Don’t run blind• Know your KPIs before launch• Track your KPIs realtime via
counters with DataDog or Cloudwatch
• Set up alerting to your DRI
Scalability
Know what infrastructure to scale and when• Data• API servers• Load balancers
Design to scale horizontally, not vertically• All services should be stateless unless they
don’t need to scale with number of users• Don’t assume a server will exist minute to
minuteKeep a safe capacity margin in your infrastructure• 50% is reasonable• Know how long it will take to increase
capacity
Managing Connections
Use connection poolingDon’t try to outsmart your language’s connection managementMaking a connection has a cost!Don’t re-invent a protocol if an existing one will do• HTTP is way easier to debug than
websockets• Websockets stream data way more
efficiently than HTTP• Both are safer than using raw TCP
QUESTIONS?
Melissa [email protected]@queenofcodehttp://www.slideshare.net/MelissaBenua
Siva [email protected]@sivakatir