linkedin's segmentation & targeting platform (hadoop summit 2013)

37
LinkedIn Segmentation & Targeting Platform: A Big Data Application Hadoop Summit, June 2013 Hien Luu, Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.

Upload: siddharth-anand

Post on 11-May-2015

7.602 views

Category:

Technology


1 download

DESCRIPTION

This presentation was presented at Hadoop Summit 2013 on June 26, 2013 by Sid Anand and Hien Luu of LinkedIn.

TRANSCRIPT

Page 1: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform: A Big Data Application

Hadoop Summit, June 2013Hien Luu, Sid Anand

Page 2: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

About Us

*

Hien Luu Sid Anand

Page 3: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Our missionConnect the world’s professionals to make

them more productive and successful

Page 4: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Over 200M members and counting

2004 2005 2006 2007 2008 2009 2010 2011 2012

2 4 817

32

55

90

145

LinkedIn Members (Millions)

200+

The world’s largest professional network

Growing at more than 2 members/sec

Source :

http://press.linkedin.com/about

©2013 LinkedIn Corporation. All Rights Reserved.

Page 5: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

*

>88%Fortune 100 Companies use LinkedIn Talent Soln to hire

Company Pages>2.9M

Professional searches in 2012

>5.7BLanguages19

>30MFastest growing demographic: Students and NCGs

The world’s largest professional network

Over 64% of members are now international

Source :

http://press.linkedin.com/about©2013 LinkedIn Corporation. All Rights Reserved.

Page 6: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Other Company Facts

*

• Headquartered in Mountain View, Calif., with offices around the world!

• As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the world

Source :

http://press.linkedin.com/about

Page 7: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Agenda

Company Overview

Big Data @ LinkedIn

The Segmentation & Targeting Problem

Solution : LinkedIn Segmentation & Targeting Platform

Q & A

Page 8: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Big Data @ LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved.

Page 9: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

LinkedIn : Big Data Story

©2013 LinkedIn Corporation. All Rights Reserved.

Our Big Data Story depends on Infrastructure!• On-line Data Infrastructure

• Near-line Data Infrastructure

• Offline Data Infrastructure

Oracle or Espresso

Updates

Web Serving

Teradata

Data Streams

Near-lineOn-line Off-line

Page 10: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Big Data Story : On-line Data

©2013 LinkedIn Corporation. All Rights Reserved.

On-line Data Infrastructure

• Supports typical OLTP requirements • Highly concurrent R/W access• Transactional guarantees• Back-up & Recovery

• Supports a central LinkedIn Data Principle! • “All data everywhere”

• All OLTP databases need to provide a time-line consistent change stream

• For this, we developed and open-sourced Databus!

Oracle or Espresso

Updates

Web Serving

On-line

Page 11: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Big Data Story : On-line Data

Oracle or Espresso Data Change Events

Search Index

Graph Index

Read Replicas

Updates

Standardization

A user updates the company, title, & school on his profile. He also accepts a connection

The write is made to an Oracle or Espresso Master and DataBus replicates it:• the profile change is applied to the Standardization service

E.g. the many forms of IBM were canonicalized for search-friendliness• …. and to the Search Index

Recruiters can find you immediately by new keywords• the connection change is applied to the Graph Index service

The user can now start receiving feed updates from his new connections

Page 12: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Big Data Story : On-line Data

Databus streams also update Hadoop!

Oracle or Espresso

Search Index

Graph Index

Read Replica

Updates

Standardization

Data Change Events

Page 13: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Big Data Story : Near-line & Off-line Data

©2013 LinkedIn Corporation. All Rights Reserved.

2 Main Sources of Data @ LinkedIn• User-provided data

• e.g. Member Profile data (e.g. employment, education history, endorsements)

• Tracking data via web site instrumentation • e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares

Oracle or Espresso

Updates

Databus

Web Servers Kafka

Teradata

Page 14: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

The

Segmentation & Targeting

Problem

©2013 LinkedIn Corporation. All Rights Reserved.

Page 15: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Segmentation & Targeting

Page 16: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Segmentation & Targeting Attribute types

Bhaskar Ghosh

Page 17: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Segmentation & Targeting

©2013 LinkedIn Corporation. All Rights Reserved.

Step 1 : Take some information about users

Member ID Join Date Country Responded to Promotion X1

1 01/01/2013 FR F

2 01/02/2013 BE F

3 01/03/2013 FR F

4 02/01/2013 FR T

Step 2 : Provide some targeting criteria for a new promotion Pick members where• Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"

Members 1 & 3

Step 3 : Target them for a different email campaign (promotion_X2)

Page 18: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Segmentation & Targeting

©2013 LinkedIn Corporation. All Rights Reserved.

Step 1 : Take some information about users

Member ID Join Date Country Responded to Promotion X1

1 01/01/2013 FR F

2 01/02/2013 BE F

3 01/03/2013 FR F

4 02/01/2013 FR T

Step 2 : Provide some targeting criteria for a new promotion Pick members where• Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"

Members 1 & 3

Step 3 : Target them for a different email campaign (promotion_X2)

Attributes

SegmentDefinition

Segment

Page 19: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Segmentation & Targeting

©2013 LinkedIn Corporation. All Rights Reserved.

Problem Definition

• The business wants to launch new campaigns often

• The business wants to specify targeting criteria (segment definitions) using an arbitrary set of attributes

• The attributes often need to be computed to fulfill the targeting criteria

• This data resides on Hadoop or TD

• The business is most comfortable with SQL-like languages

Page 20: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

Segmentation & Targeting Solution

©2013 LinkedIn Corporation. All Rights Reserved.

Page 21: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Segmentation & Targeting

Attribute Computation

Engine

Attribute Serving Engine

Page 22: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Segmentation & Targeting

Attribute Computation

Engine

Self-service

Support various data sources

Attribute

consolidation

Attribute

availability

Page 23: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Segmentation & Targeting

Attribute computation

~225M

PB

TB

TB

~240

Page 24: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

Attribute Portal Web Application

Attribute & DefinitionMetadata

Page 25: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

Attribute & DefinitionMetadata

TD Executor

Hive Executor

Pig Executor

REST

REST

REST

Page 26: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

M/RStitcher

/path/dataset1

/path/dataset2

/path/dataset3

/path/dataset4

/path/lnkd_big_table

DataLoader

Attribute consolidation & availability

Page 27: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

LinkedIn big table, the most sought after data

Segmentation

Propensity Model

Ad hoc analysis

LinkedIn big table

Page 28: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Segmentation & Targeting

Attribute Serving Engine

Self-service

Attribute predicateexpression

Build

segments

Build lists

Page 29: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Segmentation & Targeting

Serving Engine

$

count filter sumcomplex

expressions

Σ1234

LinkedIn big table

~225M

~240

Page 30: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

Inverted Index

Inverted Index

Inverted Index

M/RIndexer

LinkedIn big table

Attribute & DefinitionMetadata

Page 31: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

Who are north American recruiters that

don’t work for a competitor?

Who are the LinkedIn Talent Solution prospects

in Europe?

Who are the job seekers?

Page 32: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

JSON Predicate Expression

JSON Lucene Query Parser

Inverted Index

Inverted Index

Inverted Index

Segment &List

Page 33: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

Complex tree-like attribute predicate expressions

Page 34: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

LinkedIn Segmentation & Targeting Platform

A marketing campaign is represented by a list

Page 35: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Conclusion

Move at business speed and scale at LinkedIn scale

Segmentation & Targeting Platform– Self-service– Multiple data sources & massive data volume– Support complex expression evaluation in seconds– Attribute availability at business speed

Page 36: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Engineering Team

Jessica Ho Swetha Karthik Raj Rangaswamy Tony Tong Ajinkya Harkare Hien Luu Sid Anand

Page 37: LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)

©2013 LinkedIn Corporation. All Rights Reserved.

Questions?

More info: data.linkedin.com