scaling facebook's realtime endpoint with mongodb, snap interactive

20
Scaling The Facebook Realtime Endpoint Using MongoDB PRESENTED BY: Justin Medoy and Mike Sherov SNAP Interactive [email protected] mikesherov@snap-interactive

Upload: mongodb

Post on 13-Jul-2015

1.230 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Scaling The Facebook Realtime Endpoint Using MongoDBPRESENTED BY:

Justin Medoy and Mike SherovSNAP Interactive

[email protected]@snap-interactive

Page 2: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Redefining the Way People Meet & Socialize Online

Page 3: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

What are Facebook Realtime Updates?

Facebook says: "Real-time updates enable your application to subscribe to changes in data in Facebook."

What it means: "You provide a URL,Facebook pings it when users do stuff."

Page 4: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Pings from Facebook

● Every minute we get around 20 pings from facebook that contain data for around 11,000 users

{"object": "user","entry": [ { "uid": 1335845740, "changed_fields": [ "name", "picture" ], "time": 232323 },....]}

Page 5: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

WHAT?!? Where's the data?

● Facebook tells you that something about the field changed, but not what the current data is.

Page 6: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Retrieving User Data from the Graph

● Solution: go back to Facebook and grab the user's datahttps://graph.facebook.com?ids=<USERID>&fields=music,movies,likes*This will only get data that the user has made publicly available

● To avoid timeouts each call to Facebook only asks for the data for 25 users*Our CURL timeouts for Facebook have been lowered from the default 60 seconds to 25 seconds

Page 7: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Update the user's profile

● Facebook won't tell you exactly what's changed but we can figure it out from our own data

All Data - Stored Data = Changed Data

● The next step is to update the user's profile with this changed data

Page 8: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Mongo Architecture

● Mongo 2.0.2● Mongo PHP driver 1.2.10● Two separate replica sets

○ User data○ Interest data

● Why separate replica sets?○ Keep as much of the index as possible in

memory○ Disk reads are expensive

Page 9: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

User Data Replica Set

Design Challenge● Random access pattern over 106 million

documents

Page 10: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

User Data Replica Set

● Large $in queries● High page faults in

MMS● We upgraded from

32G to 128G on each node

Page 11: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Indexes

● We added duplicates of some of our indexes with reversed fields

● Updating all of these extra indexes was a huge bottleneck

Page 12: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Indexes

● Unique index uid_1● profile.sync_1_installed_1_platforms.facebook_1● email_1● uid_1_installed_1● last_login_1_uid_1

Page 13: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Indexes

● There were certain minutes when Facebook would tell us that the data had changed for more than 40,000 users

○ limit the amount of data Facebook can send in one minute● High number of writes and a large number of

indexes prevented the secondaries from reading the oplog because of the global write lock○ Increase the size of the oplog○ This is fixed in 2.2.1

Page 14: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Indexes and the realtime endpoint

profile.sync_1_installed_1_platforms.facebook_1● Filtered 11,000 users a minute down to a few hundred

○ moved filtering logic out of PHP into the index● Added efficiency from covered index

○ All we need is platforms.facebook, which is part of the index

Page 15: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Interest Replica Set

Different set of challenges than User repl set● Needs to power typeahead● 64 million interests● Access pattern based on interest popularity

○ Lady Gaga is going to get accessed more than Ladybug, Javascript

Page 16: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

The Typeahead{

"_id" : ObjectId("4f511a230624967b7d000003"),"name" : "Rubiks Cube","search" : "rubiks cube","subsearch" : [

"r","ru","rub","rubi","rubik","rubiks","rubiks ","rubiks c","rubiks cu","rubiks cub"

],"popularity" : NumberLong(907)

}

Page 17: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

The Typeahead

● Add an array with the first few characters of interest

● Add an index on that field● This allows us to have 10 entries in 1 index

instead of 10 separate indexes

http://docs.mongodb.org/manual/core/indexes/#index-type-multikey

Page 18: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Typeahead indexes

subsearch_1_popularity_-1● Specifying -1 for the popularity component of

the index naturally causes the typeahead to show more popular interests first

Page 19: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

Lessons Learned

● Don't over index● Covered indexes when possible● indexes to reduce size of returned data● Keep everything in memory● Multikey index for typeaheads● Utilize -1 in index for natural sorting

Page 20: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

SNAP Interactive, Inc.Contact Information

● SNAP Interactive, Inc.SNAP-Interactive.com

● Justin MedoyTeam Lead / Software [email protected]

● Mike SherovLead [email protected] @mikesherov

● For more information on our open positions, email [email protected] or check our website at www.snap-interactive.com/jobs/job-openings

meet people like you