eventbrite dataplatform and services - interest graph based recommendations

Post on 15-Jan-2015

2.018 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Data Platform and Services

Vipul Sharma and Eyal Reuveni

Agenda

EventbriteData ProductsData Platform

RecommendationsQuestions

• A social event ticketing and discovery platform• 50th Million Ticket Sold• Revenue doubled YOY• 180 Employees in SOMA SF• Solving significant engineering problems

• Data• Data, Infrastructure, Mobile, Web, Scale, Ops, QA

• Firing all cylinders and hiring blazing fastwww.eventbrite.com/jobs

Data Products

Analytics

• Add–Hoc queries by Analysts

Fraud and Spam

Data Platform

Hadoop Cluster

• 30 persistent EC2 High-Memory Instances• 30TB disk with replication factor of 2, ext3

formatted• CDH3 • Fair Scheduler• HBase

Infrastructure

• Search• Solr• Incremental updates towards event driven

• Recommendation/Graph• Hadoop• Native Java MapReduce• Bash for workflow

• Persistence• MySql• HDFS• HBase• MongoDB (Investigating Cassandra and Riak)

Infrastructure

• Stream• RabbitMQ• Internal Fire hose (Investigating Kafka)

• Offline• MapRedude• Streaming• Hive• Hue

Infrastructure - Sqoozie

• Workflow for mysql imports to HDFS• Generate Sqoop commands• Run these imports in parallel

• Transparent to schema changes• Include or exclude on column, data types, table

level• Data Type Casting tinyint(1) Integer• Distributed Table Imports

Infrastructure - Blammo

• Raw logs are imported to HDFS via flume• Almost real-time – 5 min latency• Logs are key-value pairs in JSON• Each log producer publishes schema in yaml• Hive schema and schema yaml in sync using

thrift• Control exclusion and inclusion

Recommendations

You will like to attend this event

Item Hierarchy (You bought camera so you need batteries - Amazon)

Collaborative Filtering – User-User Similarity (People who bought camera also bought batteries - Amazon)

Collaborative Filtering – Item-Item similarity(You like Godfather so you will like Scarface - Netflix)

Social Graph Based (Your friends like Lady Gaga so you will like Lady Gaga, PYMK – Facebook, Linkedin)

Interest Graph Based (Your friends who like rock music like you are attending Eric Clapton Event–Eventbrite)

Recommendation Engines

Why Interest?

Events are Social Events are Interest

Dense Graph is IrrelevantInterest are Changing

How do we know your Interest?

• We ask you• Based on your activity

• Events Attended• Events Browsed

• Facebook Interests• User Interest has to match Event category• Static

• Machine Learning• Logistic Regression using MLE• Sparse Matrix is generated using MapReduce• A model for each interest

Model Based vs Clustering

Building Social Graph is Clustering Step

Social Graph Recommendation is a Ranking Problem

Item-Item vs User-User

Implicit Social Graph

U1

U2 U3

U4 U5

E1

E2 E3

E4

Mixed Social Graph

U1

U2 U3

U4 U5

E1

E2 E3FB

LI

15M * 260 * 260 = 1.14 Trillion Edges

4Billion edges ranked

Each node is a feature vector representing a User

Each edge is a feature vector representing a Relationship

Feature Generation

• Mixed Features• A series of map-reduce jobs• Output on HDFS in flat files; Input to subsequent jobs• Orders = Event Attendees

• MAP: eid: uid• REDUCE: eid:[uid]

• Attendees Social Graph• Input: eid:[uid]• MAP: uidi:[uid]

• REDUCE: uid:[neighbors]

• Interest based features, user specific, graph mining etc• Upload feature values to HBase

U1

U2 U3

HBase

HBase

• Collect data from multiple Map Reduce jobs• Stores entire social graph• Over one million writes per second

HBase

rowid neighbors events featureX

2718282 101 3 0.3678795

HBase

rowid 314159:n 314159:e 314159:fx 161803:n 161803:e 161803:fx

2718282 31 1 0.3183 83 2 0.618

Tips & Tricks

• Distributed cache database• Sped up some Map Reduce jobs by hours• Be sure to use counters!

Tips & Tricks

• Hive (ab)uses• Almost as many hive jobs as custom ones• “flip join”• Statistical functions using hive• UDF

Tips & Tricks

• Memory Memory Memory• LZO, WAL• Combiners are great until• Shuffle and Sorting stage• Hadoop ecosystem is still new

Questions?

top related