bdt303 data science with elastic mapreduce - aws re: invent 2012

38

Upload: amazon-web-services

Post on 05-Dec-2014

2.055 views

Category:

Documents


5 download

DESCRIPTION

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.

TRANSCRIPT

Page 1: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 2: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

What is Netflix’s data warehouse?

a) Cassandra

b) Teradata

c) Hive

d) S3

Page 3: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 4: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

DSE Platform

Page 5: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 6: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 7: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

DSE Platform

S3

Chukwa

Page 8: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 9: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

Aegisthus

Page 10: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 11: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

DSE Platform

S3

Chukwa

Aegisthus

Page 12: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

Sting

Page 13: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 14: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 15: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

DSE Platform

S3

Chukwa

Aegisthus

Sting

Page 16: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

What is Netflix’s data warehouse?

a) Cassandra

b) Teradata

c) Hive

d) S3

Page 17: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

DSE Platform

S3

Chukwa

Aegisthus

Sting

Page 18: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

S3

Page 19: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

S3

Page 20: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

99.999999999%

Page 21: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 22: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

S3

Page 23: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

S3

High SLA

Query

Page 24: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

HDFS ?

Page 25: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 26: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

“Data Science as a Service”

• Execution Service / Genie

• Event Service

• Metadata Service

Page 27: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 28: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 29: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 30: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 31: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 32: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

High SLA Cluster Job

High SLA

S3

Query Cluster Job

Query

Page 33: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

High SLA

S3

Query Cluster Job

Query

Page 34: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

High SLA Cluster Job

High SLA

S3

Query Cluster Job

Query

Page 35: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

High SLA Cluster Job

High SLA

S3

Query Cluster Job

Query

Super SLA Cluster Job

Super SLA

Page 36: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

High SLA Cluster Job

High SLA

S3

Query Cluster Job

Query

Super SLA Cluster Job

Page 37: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Page 38: BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

Questions?

http://jobs.netflix.com

[email protected]