scaling facebook's realtime endpoint with mongodb, snap interactive
TRANSCRIPT
Scaling The Facebook Realtime Endpoint Using MongoDBPRESENTED BY:
Justin Medoy and Mike SherovSNAP Interactive
[email protected]@snap-interactive
Redefining the Way People Meet & Socialize Online
What are Facebook Realtime Updates?
Facebook says: "Real-time updates enable your application to subscribe to changes in data in Facebook."
What it means: "You provide a URL,Facebook pings it when users do stuff."
Pings from Facebook
● Every minute we get around 20 pings from facebook that contain data for around 11,000 users
{"object": "user","entry": [ { "uid": 1335845740, "changed_fields": [ "name", "picture" ], "time": 232323 },....]}
WHAT?!? Where's the data?
● Facebook tells you that something about the field changed, but not what the current data is.
Retrieving User Data from the Graph
● Solution: go back to Facebook and grab the user's datahttps://graph.facebook.com?ids=<USERID>&fields=music,movies,likes*This will only get data that the user has made publicly available
● To avoid timeouts each call to Facebook only asks for the data for 25 users*Our CURL timeouts for Facebook have been lowered from the default 60 seconds to 25 seconds
Update the user's profile
● Facebook won't tell you exactly what's changed but we can figure it out from our own data
All Data - Stored Data = Changed Data
● The next step is to update the user's profile with this changed data
Mongo Architecture
● Mongo 2.0.2● Mongo PHP driver 1.2.10● Two separate replica sets
○ User data○ Interest data
● Why separate replica sets?○ Keep as much of the index as possible in
memory○ Disk reads are expensive
User Data Replica Set
Design Challenge● Random access pattern over 106 million
documents
User Data Replica Set
● Large $in queries● High page faults in
MMS● We upgraded from
32G to 128G on each node
Indexes
● We added duplicates of some of our indexes with reversed fields
● Updating all of these extra indexes was a huge bottleneck
Indexes
● Unique index uid_1● profile.sync_1_installed_1_platforms.facebook_1● email_1● uid_1_installed_1● last_login_1_uid_1
Indexes
● There were certain minutes when Facebook would tell us that the data had changed for more than 40,000 users
○ limit the amount of data Facebook can send in one minute● High number of writes and a large number of
indexes prevented the secondaries from reading the oplog because of the global write lock○ Increase the size of the oplog○ This is fixed in 2.2.1
Indexes and the realtime endpoint
profile.sync_1_installed_1_platforms.facebook_1● Filtered 11,000 users a minute down to a few hundred
○ moved filtering logic out of PHP into the index● Added efficiency from covered index
○ All we need is platforms.facebook, which is part of the index
Interest Replica Set
Different set of challenges than User repl set● Needs to power typeahead● 64 million interests● Access pattern based on interest popularity
○ Lady Gaga is going to get accessed more than Ladybug, Javascript
The Typeahead{
"_id" : ObjectId("4f511a230624967b7d000003"),"name" : "Rubiks Cube","search" : "rubiks cube","subsearch" : [
"r","ru","rub","rubi","rubik","rubiks","rubiks ","rubiks c","rubiks cu","rubiks cub"
],"popularity" : NumberLong(907)
}
The Typeahead
● Add an array with the first few characters of interest
● Add an index on that field● This allows us to have 10 entries in 1 index
instead of 10 separate indexes
http://docs.mongodb.org/manual/core/indexes/#index-type-multikey
Typeahead indexes
subsearch_1_popularity_-1● Specifying -1 for the popularity component of
the index naturally causes the typeahead to show more popular interests first
Lessons Learned
● Don't over index● Covered indexes when possible● indexes to reduce size of returned data● Keep everything in memory● Multikey index for typeaheads● Utilize -1 in index for natural sorting
SNAP Interactive, Inc.Contact Information
● SNAP Interactive, Inc.SNAP-Interactive.com
● Justin MedoyTeam Lead / Software [email protected]
● Mike SherovLead [email protected] @mikesherov
● For more information on our open positions, email [email protected] or check our website at www.snap-interactive.com/jobs/job-openings
meet people like you