r&d to product pipeline using apache spark in adtech: spark summit east talk by: maximo...

12
R&D To Product Pipeline Using Apache Spark in Adtech Maximo Gurmendez Dr. Sunanda Parthasarathy Dr. Saket Mengle DataXu Inc.

Upload: spark-summit

Post on 21-Feb-2017

61 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

R&D To Product Pipeline Using Apache Spark in Adtech

Maximo GurmendezDr. Sunanda ParthasarathyDr. Saket MengleDataXu Inc.

Page 2: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

What to expect from this session• Who is DataXu?

• Why Apache Spark?

• From R&D to product using Apache Spark Demo

• Analytics using Apache Spark

Page 3: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

DataXuMake Marketing Smarter Through Data Science

• Who• Spun out of MIT Labs• A petabyte scale digital

marketing platform• One of the fastest growing

companies in Inc. 5000• What

• Help world’s most valuable brands understand and engage with their consumer

• Maximize ROI

Quick Statistics• Billions of ads served per month• ~10ms round trip response time• 130+ TB logs per day • 3000+ servers powering the platform• 13 regions, 24x7

Page 4: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

Real Time Bidding

Page 5: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

DataXu Machine Learning

Learn Models

ModelsImpressionsClicks

Activities

Calibrate

Evaluate

Real Time

BiddingS3

Page 6: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

Why is this hard?Huge Scale • 2.7 million bid decisions per second

• 3 PB of data processed daily• Runs 24 X 7 on 5 Continents• Thousands of ML Models Trained per Day

Unattended Operation • Model training and deployment runs automatically every day

Changing Industry • Need ability to adapt quickly to new customer requirements

Page 7: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

Demo

Page 8: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

Benchmarks

0 5,000,000 10,000,000 15,000,0000

100

200

300

400

500Training Time Comparison

Logistic RegressionLinear (Logistic Regression)Decision TreeLinear (Decision Tree)Linear (Decision Tree)Random ForestLinear (Random Forest)

Number of Training Records

Trai

ning

Tim

e ( i

n se

c)

Current DataXu Model Spark Random Forest0

0.4

0.8

1.2

1.6

Avg. Bidding Latency (milliseconds)

Random Forests

Logistic Regression

Naive Bayes Decision Trees

DataXu Model

020406080

100120140

Model Size in Memory (KB)

Page 9: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

S3 – meta data

Why Apache Spark for Adv. Analytics

• Makes Advanced Analytics a reality – accelerated queries, graph processing, streaming analytics

• Speaks multiple languages (Python, Scala, SQL)

• Makes it easy – Compared to Java/Hadoop complexities

• Accelerates the analyst/data scientist workflow

Real Time Bidding Engine

Adv. Analytics Engine

Page 10: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

S3 – meta data

Advanced Analytics at DataXu

Real Time Bidding Engine

Analytics Engine

Partner/Client Data

Dashboarding/Reporting

+}

Page 11: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

Analytics Demo

Page 12: R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo Gurmendez , Saket Mengle, Sunanda Parthasarathy

Thank You.

[email protected]@[email protected]

!! We’re hiring !! Data Scientists, Data Science Engineers. FTEs, Interns