feed at linkedin (quora talk)

29

Upload: shubham-gupta

Post on 23-Jan-2018

205 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Feed at LinkedIn (Quora Talk)
Page 2: Feed at LinkedIn (Quora Talk)

Feed @LinkedInOverview

Ankit GuptaEngineer, LinkedIn

Shubham GuptaEngineer, LinkedIn

Vivek NelamangalaEngineer, LinkedIn

Page 3: Feed at LinkedIn (Quora Talk)

Today’s agenda

15:00 The LinkedIn Feed

15:15 Activity Store

15:20 FollowFeed

15:25 Operational Story of FollowFeed

15:35 Q&A

Page 4: Feed at LinkedIn (Quora Talk)

The LinkedIn Feed

The personalized “home page” of LinkedIn

A heterogenous list of updates produced by a

user’s network

In addition, we also show other recommendations (jobs,

articles, people) and monetize via native ads.

Mission: Give professionals the power to stay

informed to make them more productive and

successful every day.

The Feed is:

Page 5: Feed at LinkedIn (Quora Talk)

Share

Like

Sponsored Content

Comment

News recommendation

Connect

Organic updates

Recommendation

Feed Composition

Page 6: Feed at LinkedIn (Quora Talk)

Creation

● Member sharing

● Member publishing

● Video

● Editorial Tools

DiscoveryEngagement

● Feed Experience

● Follow Ecosystem

● Conversations/Comments

● Sponsored Updates

● Control of Feed

● Promotions

● Feed Relevance

● Feed delivered as Email

● Notifications

Feed is an ecosystem

Page 7: Feed at LinkedIn (Quora Talk)

Activity Data Model

● AVO triples

○ Actor: “Jeff Weiner”

○ Verb: “Shared”

○ Object: “Article”

● Visibility

○ [Public, Self, Connections, Follow]

● Domain Entity (optional)

○ Activity “Foreign Key”

○ e.g. comment-id for comment activity

Page 8: Feed at LinkedIn (Quora Talk)

1. Homepage request

2. Fetch viewer’s network

and features

4. Resolve activities

3. Call data sources

Client

Feed Mixer

(Blend across data sources)

FollowFeed

Social Graph

Service

Sponsored

Content (Ads)

Trending in

Industry

Activities

store

Activities

Database

Activities

stream

Features

store

Stats Server

Sharing Profiles

Read

Write

People You

May Know

Jobs

Recommend

ations

Page 9: Feed at LinkedIn (Quora Talk)

Activity Store

Page 10: Feed at LinkedIn (Quora Talk)

Activity Store

Data stored in an in-house distributed document database

called Espresso

Keyed by Activity ID

Unified Social Content PlatformFlexible Schema

Single distribution pipeline

Page 11: Feed at LinkedIn (Quora Talk)

FollowFeed

Page 12: Feed at LinkedIn (Quora Talk)

FollowFeed

● Term-partitioned index

○ Different from generic search indices which are partitioned

by documents

○ Partitioned by actor (ex, member:1, school:2)

● Posting list of reverse chronologically ordered list of activities

by or about an actor

Page 13: Feed at LinkedIn (Quora Talk)

m1

m2

c1

m3

m4

m5

m6

s7

s8

Storage cluster

partition 1 partition 2 partition N

Activities stream

Partitioner cluster

Query cluster

Get Updates for viewer with

network {m1, c1, m6}

{m1, c1} {m6}

Features Pipeline

Page 14: Feed at LinkedIn (Quora Talk)

FollowFeed Storage - Layers

Timeline Index

Filtering

Ranking

topK

Page 15: Feed at LinkedIn (Quora Talk)

FollowFeed

select activitieswhere (required)

actor IN <my network>TimeRange between [start, end] OR count = <C>

where (optional)(actor type | verb type | object type) in <X>Visibility is (connections-only OR followees-only OR public)

sort by time OR relevance

Page 16: Feed at LinkedIn (Quora Talk)

Why embedded?

Bring computation closer to data

Allows scoring of tens of millions of records per second

Less data transferred over the wire

Colocation of relevance features and data

Document features

RocksDB as embedded database

Open sourced by Facebook in 2014

Page 17: Feed at LinkedIn (Quora Talk)

Operational Story of FollowFeed

Page 18: Feed at LinkedIn (Quora Talk)

Traffic

Metrics

Performance testing during development

Legacy System

Log collection system

Log collector

Disk

Log processor

and replayFollowFeed

Access

logs

Live traffic

(read only)

Page 19: Feed at LinkedIn (Quora Talk)

Development process

Code

Check in

Staging

(Integration

testing)

Dark

CanaryCanary Production

Page 20: Feed at LinkedIn (Quora Talk)

Dark Canary

Request dispatcher

FollowFeed

live node

FollowFeed

dark canary

node

Request

Response

Copy of

request

metrics

Page 21: Feed at LinkedIn (Quora Talk)

No Alerts Left Behind

● Meaningful thresholds● Each alert has an action● Alerts only on external symptoms● An underlying issue triggers only a single alert

Page 22: Feed at LinkedIn (Quora Talk)

Backup

FollowFeed

Node

Backup NBackup 2Backup 1

Cron Script

HDFS

Copy to HDFS

Call to backup API

...

Page 23: Feed at LinkedIn (Quora Talk)

Restore

New FollowFeed

Node

Copy from HDFS

Backup NBackup 2Backup 1

HDFS

...

Page 24: Feed at LinkedIn (Quora Talk)

Rebalancing

Rebalance

script

Box 1

Partitions 0..19

Box 2

Partitions 20..39

Box 3

Partitions 40..59

Box 1

Partitions

0..14

Box 2

Partitions

15..29

Box 3

Partitions

30..44

Box 4

Partitions

45..59

Page 25: Feed at LinkedIn (Quora Talk)

Rebalancing

New

FollowFeed

Node 1

Backup NBackup 3Backup 1

HDFS

Extension of restore script

New

FollowFeed

Node 2

Backup 2

New

FollowFeed

Node 3

New

FollowFeed

Node N

Copy from HDFS

Page 26: Feed at LinkedIn (Quora Talk)

Questions

Page 27: Feed at LinkedIn (Quora Talk)

Appendix

Page 28: Feed at LinkedIn (Quora Talk)

Timeline storage structure

Blob 1 Blob 2 Blob 3

Page 29: Feed at LinkedIn (Quora Talk)

Timeline storage structure

Blob 1 Blob 2 Blob 3

Blob header

Update NUpdate N-

1Update N-

2……

Next blob key