software developer and architecture @ linkedin (qcon sf 2014)

73
Software Development & Arch @ LinkedIn 1 Sid Anand QCon SF 2014 @r39132

Upload: siddharth-anand

Post on 02-Jul-2015

844 views

Category:

Internet


2 download

DESCRIPTION

Software Developer and Architecture @ LinkedIn (QCon SF 2014)

TRANSCRIPT

Page 1: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Software Development & Arch @ LinkedIn

1

Sid Anand

QCon SF 2014

@r39132

Page 2: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

About Me

2

*

Current Life…

Chief Architect @ ClipMine, a video discovery

company

QCon SF Program Committee member

Dad to a very energetic 2 year old boy

Previous Life…

Architect in Search and Distributed Data @

LinkedIn

Cloud Data Architect @ Netflix

VP Engineering at Etsy

Software Developer at eBay

@r39132 2

Page 3: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

A Closer Look @ LinkedIn

3@r39132 3

Page 4: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn

4

*

***

Then

• Created in 2002 in Reid Hoffman’s living room

• In its first month of operation, LinkedIn added 4500 members!

@r39132 4

Page 5: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn

5

*

Then

• Created in 2002 in Reid Hoffman’s living room

• In its first month of operation, LinkedIn added 4500 members!

Now

• 332M members in 200 countries

• 2 members sign up every second

• >60% of members overseas

• In Q3’14, 75% of new members came from overseas

@r39132 5

Page 6: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn

6

*

Then

• Created in 2002 in Reid Hoffman’s living room

• In its first month of operation, LinkedIn added 4500 members!

Now

• 332M members in 200 countries

• 2 members sign up every second

• >60% of members overseas

• In Q3’14, 75% of new members are coming from overseas

• Fastest growing demographic is not geographic, it’s students!

• > 10% of user base already and growing!

@r39132 6

Page 7: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn

7

*

Member-growth started to ramp up during 2011, when we IPO’d

• 2010 : 55M

• 2011 : 90M (IPO)

• 2012 : 145M

• Q3’14 : 332M

(note : numbers reflect start of year)

We added ~ same number of users in 2010 than over previous 6 years!

@r39132 7

Page 8: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn

8

*

***

Employee-growth also started to ramp up during 2011

• 2010 : 500

• 2011 : 1K (IPO)

• 2012 : 2100

• Q3’14: 6K (25% in Engineering)

(note : numbers reflect start of year)

@r39132 8

Page 9: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

9@r39132 9

Page 10: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

10@r39132 10

Alan Shepard

• 2nd man in space

• 5th person to walk on the moon!

• 1st person to hit a golf ball on the

moon!

Page 11: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn

11@r39132 11

When asked by reporters what he thought about while

awaiting liftoff, he replied: "The fact that every part of this

ship was built by the lowest bidder"

Page 12: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

How did LinkedIn scale for

company and member growth?

12@r39132 12

Page 13: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Software Development

Challenges

13@r39132 13

Page 14: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

14

Circa 2011

• On my first day at LinkedIn, I felt pretty excited!

Software Development : Challenges

@r39132

Linux Desktop

• 8 Core

• 64GB RamMac Air

Page 15: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

15

Circa 2011

• On my first day at LinkedIn, I felt pretty excited!

Software Development : Challenges

@r39132

Linux Desktop

• 8 Core

• 64GB RamMac Air

Page 16: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

16

Circa 2011

• Then I tried to compile the code on my laptop!

Software Development : Challenges

@r39132

Linux Desktop

• 8 Core

• 64GB RamMac Air

Page 17: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

17

Circa 2011

• 300+ code projects in a single SVN Repo

• SVN checkout world & go-to-lunch

• Needed a server-grade machine to compile it!

• Ant build (world) & go-make-espresso

• Almost every WAR was built from source not intermediate JARs

• To test your code locally, you needed to locally deploy every service that

your code depended on! (maybe 20)

• So, yes, you need a machine that typically lives in your data center!

Software Development : Challenges

@r39132

Page 18: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

18

Circa 2011

• Assume that your code is now

• Written

• Compiled

• Locally Tested

• What Next?

Software Development : Challenges

@r39132

Page 19: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

19

Circa 2011

• 500+ developers were checking code into the master branch on the single

repo!

• So, someone broke master every day!

• So

• 3 hours to write, build, and locally test code

• 3 days to commit it!

Software Development : Challenges

@r39132

Page 20: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

20

Software Development : Challenges

@r39132

Page 21: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

21

Now (Solved)

• Do what the open-source world does with some improvements!

• Break the monolithic repo into many individual Git Repos!

• Have WARs depend on intermediate JARs – don’t not build the world!

• Do not deploy the world for local testing – just connect your Dev

machine to a test environment!

• What are the improvements?

Software Development : Challenges

@r39132

Page 22: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Software Development

Life Cycle

22@r39132 22

Page 23: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

23

Software Development

@r39132

1. Alice commits code to Git

2. Alice sends a Review Board request

to Bob & Cathy, owners of the files!

3. Both Bob & Cathy give ship-its

4. Alice amends her commit message with :

RB=<review board id>

BUILD-WAR=<list of wars to build>

Code Reviews

Page 24: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

24

Software Development

@r39132

1. Alice pushes code to our Gitorious server where the following

verifications:

1. Pre-push Sanity Checks! Must pass of push rejected!

1. Have all owners of the changed files given ship-its?

2. Does the code build?

2. For JAR builds, also build upstream WARs!

3. Run Integration Tests!

Code Push (Git Push)

Page 25: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

25

Software Development

@r39132

1. Assuming that all checks passed, the WAR is now

available

2. Our system automatically deploys all wars to test

servers

3. QA verifies the new builds

QATest / Staging

Page 26: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

26

Software Development

@r39132

1. Service owner Dave canaries the new WAR

2. Our EKG system then compares the canary machine to one control

machine for 1 hour of product traffic for the following:

1. CPU, Memory increase

2. Fan-in/Fan-out increase

3. Error rate increase

4. Latency increase

Production - Canary

Page 27: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

27

Software Development

@r39132

1. Service owner Dave reviews the EKG report

2. If it looks acceptable, he promotes the build to the rest of the cluster in all

data centers

Production - Promotion

Page 28: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

How did LinkedIn scale for

company and member growth?

28@r39132 28

Page 29: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Architectural

Practices

29@r39132 29

Page 30: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Web

Servers

Oracle

LinkedIn Architecture

@r39132 30

Proto-typical Use – Case

• A member updates her profile with new skills, job title,

and education

• She also accepts a connection request from another

member

Behind the scenes

• Web servers commit data to Oracle

• What Happens Next?

Page 31: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Web

Servers

Oracle

LinkedIn Architecture

@r39132 31

What Happens Next?

Profile Updates

• She should should become instantly searchable by her

new skills, job title, & education!

• New groups and job ads should be recommended to her

Connection Updates

• The news feed should instantly reflect content updates

from her new connection!

• Also, based on the new connection, the PYMK widget

should discover a new 2nd degree neighborhood!

Page 32: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Web

Servers

(writers)

Oracle

LinkedIn Architecture

@r39132 32

Databus

Search

Caches

Graph

Recommender

Systems

(PYMK, Jobs)

DownstreamStreams

DW

Page 33: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

33

We also have a data pipeline to capture high-throughput events

that we need to count!

Databases are not a good place to do high-TP atomic counting!

Kafka is!

• This is typically used for ranking signals

• E.g. counts member page views to determine who are “hot”

LinkedIn : Architecture

@r39132

Page 34: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Web

Servers

(writers)

Oracle

LinkedIn Architecture

@r39132 34

Kafka

Databus

Search

Systems

Caches

Graph Systems

Recommender

Systems

DownstreamStreams

DW

Page 35: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Architecture : Single Data Center!

@r39132 35

Page 36: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Architecture : Single Data Center!

@r39132 36

Page 37: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Architecture : Multi-data Center Project

@r39132 37

Page 38: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Architecture : Rule 1

@r39132 38

Partition your user base across the data centers!

e.g. using Akamai GTM

Page 39: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Architecture : Rule1

@r39132 39

Problem!

User 1 (mapped to DC1) updates his profile! How will User 2 (mapped to DC2)

see it?

Page 40: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Architecture : Rule 2

@r39132 40

Link your data centers together at the data fabric level!

Not a new concept! Cassandra has been doing it for a few years now in the

OLTP database space!

Page 41: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Architecture : Rule 2

@r39132 41

Link your data centers together at the data fabric level!

Not a new concept! Cassandra has been doing it for a few years now in the

OLTP database space!

LinkedIn’s Sources of Truth

• We have to make both work in across

multiple data centers!

Page 42: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Architecture : Rule 2

@r39132 42

Link your data centers together at the data fabric level!

Not a new concept! Cassandra has been doing it for a few years now in the

OLTP database space!

LinkedIn’s Sources of Truth

• We have to make both work in across

multiple data centers!

• Oracle is fairly easy : we use Oracle

Golden-gate!

• Kafka is also pretty easy!

Page 43: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Kafka Multi-Data Center

@r39132 43

Kafka

Local

Producer

Consumer

of Local

Events

Kafka Data Center 1

Page 44: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Kafka Multi-Data Center

@r39132 44

Kafka

Local

Producer

Consumer

of Local

Events

Kafka

Local

Producer

Consumer

of Local

Events

Kafka Data Center 2Kafka Data Center 1

Page 45: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Kafka Multi-Colo

@r39132 45

Kafka

Local

Producer

Consumer

of Local

Events Consumer

of Global

Events

Kafka

Local

Producer

Consumer

of Local

Events

Kafka Data Center 2Kafka Data Center 1

Page 46: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Kafka Multi-Colo

@r39132 46

Kafka

Local

Producer

Kafka

Global

Consumer

of Local

Events Consumer

of Global

Events

Kafka

Local

Producer

Consumer

of Local

Events

Kafka Data Center 2Kafka Data Center 1

Page 47: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Kafka Multi-Colo

@r39132 47

Kafka

Local

Producer

Kafka

Global

Consumer

of Local

Events Consumer

of Global

Events

Kafka

Local

Producer

Kafka

Global

Consumer

of Local

EventsConsumer

of Global

Events

Kafka Data Center 2Kafka Data Center 1

Page 48: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Architecture : Rule 3

@r39132 48

Don’t make any web service calls between data centers!

It kills latency, which kills availability!

Page 49: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn : Architecture

@r39132 49

Page 50: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

How did LinkedIn scale for

company and member growth?

50@r39132 50

Page 51: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search

51@r39132 51

Page 52: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

52

LinkedIn Search

@r39132

Why is Search important to LinkedIn?

• Search is a significant income driver!

• 332M members that recruiters pay to find! (Recruiter

Search)

• 2M+ jobs that companies pay to list so you can find them!

(Job Search)

Page 53: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

What Makes LinkedIn Search

Unique?

53@r39132 53

Page 54: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

54

LinkedIn Search : Federated

@r39132

Page 55: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Federated

@r39132 55

• We index many entities

• members, jobs, companies, groups, universities, articles, slides, etc..

• These are separate (vertical) search-engines!

Page 56: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Federated

@r39132 56

• We index many entities

• members, jobs, companies, groups, universities, articles, slides, etc..

• These are separate (vertical) search-engines!

• When a user enters “sr software engineer”, which index should we look in?

• Jobs, members, groups?

Page 57: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Federated

@r39132 57

• We index many entities

• members, jobs, companies, groups, universities, articles, slides, etc..

• These are separate (vertical) search-engines!

• When a user enters “sr software engineer” , which index should we look in?

• Jobs, members, groups?

• Can we simply send the request to all of the search engines and then show

the most relevant results?

• No

• Ranks (scores) are not comparable across verticals

Page 58: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Federated

@r39132 58

• We index many entities

• members, jobs, companies, groups, universities, articles, slides, etc..

• These are separate (vertical) search-engines!

• When a user enters “sr software engineer” , which index should we look in?

• Jobs, members, groups?

• Can we simply send the request to all of the search engines and then show

the most relevant results?

• No

• Ranks (scores) are not comparable across verticals

• What if we pick a vertical based on a user feature?

• Job seeker sees jobs, recruiter sees members

• Intent Detection : done by Federator

Page 59: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Query Rewriting

@r39132 59

• Say a recruiter searches for “sr software eng”

• There are 20+ ways to represent this title

• senior swe

• sr swe

• senior software engineer

Page 60: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Query Rewriting

@r39132 60

• Say a recruiter searches for “sr software eng”

• There are 20+ ways to represent this title

• senior swe

• sr swe

• senior software engineer

• To solve this, we can use a title standarizer, though not every title may

have a canonical form!

• If a standardized title exists, we can rewrite the user query

• title:sr AND title:software AND title:eng std_title:sswe234

Page 61: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Query Rewriting

@r39132 61

• Say a recruiter searches for “sr software eng”

• There are 20+ ways to represent this title

• senior swe

• sr swe

• senior software engineer

• To solve this, we developed a title standarizer!

• If a standardized title exists, we can rewrite the user query

• title:sr AND title:software AND title:eng std_title:sswe234

• Query Rewriting helps by expanding the search space by methods such

as synonym expansion, spell correction, etc… So we need it!

Page 62: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Flexible Scoring

@r39132 62

• We index many entities!

• Companies, Members, Universities, etc…

• We use different scoring formulas and signals for each vertical

• We need a way to easily plug-in different custom scorers!

Page 63: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Open Source

@r39132 63

• Leading open source alternatives (e.g. Lucene, ElasticSearch,

SOLR) do not offer these!

• Search Federation

• Pluggable Query Rewriting

• Pluggable and Flexible Scoring

• They DO offer some distributed system management, which we will

have to re-invent unfortunately

Page 64: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

LinkedIn Search : Open Source

@r39132 64

• Leading open source alternatives (e.g. Lucene, ElasticSearch,

SOLR) do not offer these!

• Search Federation

• Pluggable Query Rewriting

• Pluggable and Flexible Scoring

• They DO offer some distributed system management, which we will

have to re-invent unfortunately

• So, we created Galene, LinkedIn’s new search architecture!

https://engineering.linkedin.com/search/did-you-mean-galene

Page 65: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

y Questions?

65@r39132 65

Page 66: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Bonus Slides

66@r39132 66

Page 67: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

Galene Architecture

67@r39132 67

Page 68: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

68

Galene Architecture : Querying

@r39132

Federator

Frontend

Browser

Vertical

Search

Node

Vertical

Broker

• Query Rewriting (Pluggable)

• Scatter-gather across shards

• Lucene (optionally sharded)

• Scoring (Pluggable)

• Query Intent Detection

• Result Blending

Other

Verticals

….

Page 69: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

69

Galene Architecture : Indexing (Offline)

@r39132

Federator

Frontend

Browser

Vertical

Search

Node

Hadoop

Vertical

Indexer

Node

Vertical

Broker

Index

Distribution

Service

Offline Index Building and

Distribution

• Batch-oriented, built daily

• Builds offline ranking and rewriting

models

• Rebuilds Indexes when new fields

added

Page 70: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

70

Galene Architecture

@r39132

Federator

Frontend

Browser

Vertical

Search

Node

Hadoop

Vertical

Indexer

Node

Vertical

Broker

Index

Distribution

Service

Offline Index Building and

Distribution

• Bit-Torrent-based Index Distribution Service

• Pushes new indexes and models to running

services

Page 71: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

71

Galene Architecture

@r39132

Federator

Frontend

Browser

Vertical

Search

Node

Vertical

Live

Updater

Hadoop

Vertical

Indexer

Node

Vertical

Broker

Index

Distribution

Service

KafkaDatabus

Kafka

Samza

Online Index Updates

• Online (near-real-time) indexer

• Updates indexes between Hadoop builds

Page 72: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

72

Galene Architecture

@r39132

Federator

Frontend

Browser

Vertical

Search

Node

Vertical

Live

Updater

Hadoop

Vertical

Indexer

Node

Vertical

Broker

Index

Distribution

Service

KafkaDatabus

Kafka

Samza

Periodic Index

Optimization

• Snapshots live data

into a compact format

• Send ss-index to

search nodes over bit-

torrent

Page 73: Software Developer and Architecture @ LinkedIn (QCon SF 2014)

y Questions?

73@r39132 73