eventbrite dataplatform and services - interest graph based recommendations

34
Data Platform and Services Vipul Sharma and Eyal Reuveni

Upload: vipul-sharma

Post on 15-Jan-2015

2.018 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Eventbrite dataplatform and services - Interest graph based recommendations

Data Platform and Services

Vipul Sharma and Eyal Reuveni

Page 2: Eventbrite dataplatform and services - Interest graph based recommendations

Agenda

EventbriteData ProductsData Platform

RecommendationsQuestions

Page 3: Eventbrite dataplatform and services - Interest graph based recommendations

• A social event ticketing and discovery platform• 50th Million Ticket Sold• Revenue doubled YOY• 180 Employees in SOMA SF• Solving significant engineering problems

• Data• Data, Infrastructure, Mobile, Web, Scale, Ops, QA

• Firing all cylinders and hiring blazing fastwww.eventbrite.com/jobs

Page 4: Eventbrite dataplatform and services - Interest graph based recommendations

Data Products

Page 5: Eventbrite dataplatform and services - Interest graph based recommendations
Page 6: Eventbrite dataplatform and services - Interest graph based recommendations
Page 7: Eventbrite dataplatform and services - Interest graph based recommendations

Analytics

• Add–Hoc queries by Analysts

Page 8: Eventbrite dataplatform and services - Interest graph based recommendations

Fraud and Spam

Page 9: Eventbrite dataplatform and services - Interest graph based recommendations

Data Platform

Page 10: Eventbrite dataplatform and services - Interest graph based recommendations
Page 11: Eventbrite dataplatform and services - Interest graph based recommendations

Hadoop Cluster

• 30 persistent EC2 High-Memory Instances• 30TB disk with replication factor of 2, ext3

formatted• CDH3 • Fair Scheduler• HBase

Page 12: Eventbrite dataplatform and services - Interest graph based recommendations

Infrastructure

• Search• Solr• Incremental updates towards event driven

• Recommendation/Graph• Hadoop• Native Java MapReduce• Bash for workflow

• Persistence• MySql• HDFS• HBase• MongoDB (Investigating Cassandra and Riak)

Page 13: Eventbrite dataplatform and services - Interest graph based recommendations

Infrastructure

• Stream• RabbitMQ• Internal Fire hose (Investigating Kafka)

• Offline• MapRedude• Streaming• Hive• Hue

Page 14: Eventbrite dataplatform and services - Interest graph based recommendations

Infrastructure - Sqoozie

• Workflow for mysql imports to HDFS• Generate Sqoop commands• Run these imports in parallel

• Transparent to schema changes• Include or exclude on column, data types, table

level• Data Type Casting tinyint(1) Integer• Distributed Table Imports

Page 15: Eventbrite dataplatform and services - Interest graph based recommendations

Infrastructure - Blammo

• Raw logs are imported to HDFS via flume• Almost real-time – 5 min latency• Logs are key-value pairs in JSON• Each log producer publishes schema in yaml• Hive schema and schema yaml in sync using

thrift• Control exclusion and inclusion

Page 16: Eventbrite dataplatform and services - Interest graph based recommendations

Recommendations

Page 17: Eventbrite dataplatform and services - Interest graph based recommendations

You will like to attend this event

Page 18: Eventbrite dataplatform and services - Interest graph based recommendations

Item Hierarchy (You bought camera so you need batteries - Amazon)

Collaborative Filtering – User-User Similarity (People who bought camera also bought batteries - Amazon)

Collaborative Filtering – Item-Item similarity(You like Godfather so you will like Scarface - Netflix)

Social Graph Based (Your friends like Lady Gaga so you will like Lady Gaga, PYMK – Facebook, Linkedin)

Interest Graph Based (Your friends who like rock music like you are attending Eric Clapton Event–Eventbrite)

Recommendation Engines

Page 19: Eventbrite dataplatform and services - Interest graph based recommendations

Why Interest?

Events are Social Events are Interest

Dense Graph is IrrelevantInterest are Changing

Page 20: Eventbrite dataplatform and services - Interest graph based recommendations

How do we know your Interest?

• We ask you• Based on your activity

• Events Attended• Events Browsed

• Facebook Interests• User Interest has to match Event category• Static

• Machine Learning• Logistic Regression using MLE• Sparse Matrix is generated using MapReduce• A model for each interest

Page 21: Eventbrite dataplatform and services - Interest graph based recommendations

Model Based vs Clustering

Building Social Graph is Clustering Step

Social Graph Recommendation is a Ranking Problem

Item-Item vs User-User

Page 22: Eventbrite dataplatform and services - Interest graph based recommendations

Implicit Social Graph

U1

U2 U3

U4 U5

E1

E2 E3

E4

Page 23: Eventbrite dataplatform and services - Interest graph based recommendations

Mixed Social Graph

U1

U2 U3

U4 U5

E1

E2 E3FB

LI

Page 24: Eventbrite dataplatform and services - Interest graph based recommendations

15M * 260 * 260 = 1.14 Trillion Edges

4Billion edges ranked

Each node is a feature vector representing a User

Each edge is a feature vector representing a Relationship

Page 25: Eventbrite dataplatform and services - Interest graph based recommendations

Feature Generation

• Mixed Features• A series of map-reduce jobs• Output on HDFS in flat files; Input to subsequent jobs• Orders = Event Attendees

• MAP: eid: uid• REDUCE: eid:[uid]

• Attendees Social Graph• Input: eid:[uid]• MAP: uidi:[uid]

• REDUCE: uid:[neighbors]

• Interest based features, user specific, graph mining etc• Upload feature values to HBase

Page 26: Eventbrite dataplatform and services - Interest graph based recommendations

U1

U2 U3

Page 27: Eventbrite dataplatform and services - Interest graph based recommendations

HBase

Page 28: Eventbrite dataplatform and services - Interest graph based recommendations

HBase

• Collect data from multiple Map Reduce jobs• Stores entire social graph• Over one million writes per second

Page 29: Eventbrite dataplatform and services - Interest graph based recommendations

HBase

rowid neighbors events featureX

2718282 101 3 0.3678795

Page 30: Eventbrite dataplatform and services - Interest graph based recommendations

HBase

rowid 314159:n 314159:e 314159:fx 161803:n 161803:e 161803:fx

2718282 31 1 0.3183 83 2 0.618

Page 31: Eventbrite dataplatform and services - Interest graph based recommendations

Tips & Tricks

• Distributed cache database• Sped up some Map Reduce jobs by hours• Be sure to use counters!

Page 32: Eventbrite dataplatform and services - Interest graph based recommendations

Tips & Tricks

• Hive (ab)uses• Almost as many hive jobs as custom ones• “flip join”• Statistical functions using hive• UDF

Page 33: Eventbrite dataplatform and services - Interest graph based recommendations

Tips & Tricks

• Memory Memory Memory• LZO, WAL• Combiners are great until• Shuffle and Sorting stage• Hadoop ecosystem is still new

Page 34: Eventbrite dataplatform and services - Interest graph based recommendations

Questions?