feed at linkedin (quora talk)

Post on 23-Jan-2018

205 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Feed @LinkedInOverview

Ankit GuptaEngineer, LinkedIn

Shubham GuptaEngineer, LinkedIn

Vivek NelamangalaEngineer, LinkedIn

Today’s agenda

15:00 The LinkedIn Feed

15:15 Activity Store

15:20 FollowFeed

15:25 Operational Story of FollowFeed

15:35 Q&A

The LinkedIn Feed

The personalized “home page” of LinkedIn

A heterogenous list of updates produced by a

user’s network

In addition, we also show other recommendations (jobs,

articles, people) and monetize via native ads.

Mission: Give professionals the power to stay

informed to make them more productive and

successful every day.

The Feed is:

Share

Like

Sponsored Content

Comment

News recommendation

Connect

Organic updates

Recommendation

Feed Composition

Creation

● Member sharing

● Member publishing

● Video

● Editorial Tools

DiscoveryEngagement

● Feed Experience

● Follow Ecosystem

● Conversations/Comments

● Sponsored Updates

● Control of Feed

● Promotions

● Feed Relevance

● Feed delivered as Email

● Notifications

Feed is an ecosystem

Activity Data Model

● AVO triples

○ Actor: “Jeff Weiner”

○ Verb: “Shared”

○ Object: “Article”

● Visibility

○ [Public, Self, Connections, Follow]

● Domain Entity (optional)

○ Activity “Foreign Key”

○ e.g. comment-id for comment activity

1. Homepage request

2. Fetch viewer’s network

and features

4. Resolve activities

3. Call data sources

Client

Feed Mixer

(Blend across data sources)

FollowFeed

Social Graph

Service

Sponsored

Content (Ads)

Trending in

Industry

Activities

store

Activities

Database

Activities

stream

Features

store

Stats Server

Sharing Profiles

Read

Write

People You

May Know

Jobs

Recommend

ations

Activity Store

Activity Store

Data stored in an in-house distributed document database

called Espresso

Keyed by Activity ID

Unified Social Content PlatformFlexible Schema

Single distribution pipeline

FollowFeed

FollowFeed

● Term-partitioned index

○ Different from generic search indices which are partitioned

by documents

○ Partitioned by actor (ex, member:1, school:2)

● Posting list of reverse chronologically ordered list of activities

by or about an actor

m1

m2

c1

m3

m4

m5

m6

s7

s8

Storage cluster

partition 1 partition 2 partition N

Activities stream

Partitioner cluster

Query cluster

Get Updates for viewer with

network {m1, c1, m6}

{m1, c1} {m6}

Features Pipeline

FollowFeed Storage - Layers

Timeline Index

Filtering

Ranking

topK

FollowFeed

select activitieswhere (required)

actor IN <my network>TimeRange between [start, end] OR count = <C>

where (optional)(actor type | verb type | object type) in <X>Visibility is (connections-only OR followees-only OR public)

sort by time OR relevance

Why embedded?

Bring computation closer to data

Allows scoring of tens of millions of records per second

Less data transferred over the wire

Colocation of relevance features and data

Document features

RocksDB as embedded database

Open sourced by Facebook in 2014

Operational Story of FollowFeed

Traffic

Metrics

Performance testing during development

Legacy System

Log collection system

Log collector

Disk

Log processor

and replayFollowFeed

Access

logs

Live traffic

(read only)

Development process

Code

Check in

Staging

(Integration

testing)

Dark

CanaryCanary Production

Dark Canary

Request dispatcher

FollowFeed

live node

FollowFeed

dark canary

node

Request

Response

Copy of

request

metrics

No Alerts Left Behind

● Meaningful thresholds● Each alert has an action● Alerts only on external symptoms● An underlying issue triggers only a single alert

Backup

FollowFeed

Node

Backup NBackup 2Backup 1

Cron Script

HDFS

Copy to HDFS

Call to backup API

...

Restore

New FollowFeed

Node

Copy from HDFS

Backup NBackup 2Backup 1

HDFS

...

Rebalancing

Rebalance

script

Box 1

Partitions 0..19

Box 2

Partitions 20..39

Box 3

Partitions 40..59

Box 1

Partitions

0..14

Box 2

Partitions

15..29

Box 3

Partitions

30..44

Box 4

Partitions

45..59

Rebalancing

New

FollowFeed

Node 1

Backup NBackup 3Backup 1

HDFS

Extension of restore script

New

FollowFeed

Node 2

Backup 2

New

FollowFeed

Node 3

New

FollowFeed

Node N

Copy from HDFS

Questions

Appendix

Timeline storage structure

Blob 1 Blob 2 Blob 3

Timeline storage structure

Blob 1 Blob 2 Blob 3

Blob header

Update NUpdate N-

1Update N-

2……

Next blob key

top related