bdt201 aws data pipeline - aws re: invent 2012

50

Upload: amazon-web-services

Post on 01-Jul-2015

3.940 views

Category:

Documents


3 download

DESCRIPTION

In this session, we'll review the features and architecture of the new AWS Data Pipeline service and explain how you can use it to better manage your data-driven workloads. We'll then go over a few examples of setting up and provisioning a pipeline in the system.

TRANSCRIPT

Page 1: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 2: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 3: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 4: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 5: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 6: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Page 7: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon DynamoDB Amazon S3

Page 8: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 9: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Page 10: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Page 11: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Page 12: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Page 13: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Page 14: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 15: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 16: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Input Datanode

Activity

[Output Datanode]

Page 17: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Input Datanode with precondition check

Activity with failure & delay notifications

Ouput Datanode

Page 18: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 19: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 20: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Compute Resources

Data Data

Data Stores Data Stores

Page 21: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 22: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Start

Interval

[End]

Page 23: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Noon Today

1 hour

Page 24: BDT201 AWS Data Pipeline - AWS re: Invent 2012

…..

12-1pm

1-2pm

2-3pm

X

Page 25: BDT201 AWS Data Pipeline - AWS re: Invent 2012

…..

12-1pm

1-2pm

2-3pm

1 day X

X

Page 26: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Hourly

Daily

Weekly

Monthly

Yearly

Quarterly

Page 27: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 28: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 29: BDT201 AWS Data Pipeline - AWS re: Invent 2012

S3 logs (hourly) Geolocation data

Per-geography

usage computation

(hourly)

Redshift

results

Page 30: BDT201 AWS Data Pipeline - AWS re: Invent 2012

S3 logs (hourly)

Precondition: files exist

Geolocation data

Precondition: ./geo_available

Per-geography

usage computation

(hourly)

Redshift

results

Page 31: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 32: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Dynamo

event data RDS

demographics

Hive-based

analysis (hourly)

Redshift

results

Page 33: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 34: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Hourly click updates Hourly event analysis

Daily reporting SQL

Page 35: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 36: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

logs

Custom

Precondition

EMR usage-by-geo job

Amazon EC2

report generation

Amazon

DynamoDB

event data

Amazon RDS

demographics

Amazon Redshift

DW table

Amazon

Redshift

DW table

Hive

script

Page 37: BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon S3

logs

Custom

Precondition

EMR usage-by-geo job

Amazon EC2

report generation

Amazon

DynamoDB

event data

Amazon RDS

demographics

Amazon Redshift

DW table

Amazon

Redshift

DW table

Hive

script

Page 38: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 39: BDT201 AWS Data Pipeline - AWS re: Invent 2012

We Manage You Manage

EC2

Instances

EMR Clusters On Premise Resources

EC2

Instances

EMR Clusters

Page 40: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 41: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 42: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 43: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 44: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 45: BDT201 AWS Data Pipeline - AWS re: Invent 2012

{

"objects" : [

{

"name" : “My Copy”,

"type" : “Copy Action”,

“input”: {“ref” : “My RDS Data”},

“output”: {“ref” : “My S3 Data”},

”runsOn” : {“ref”: “My Instance”},

"schedule" : { "ref" : “My Schedule" } },

{

"name" : ”My Instance”,

"type" : ”EC2Instance”,

"instanceType" : "m1.small”,

"schedule" : { "ref” : “My Schedule" } },

…..

}

Page 46: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 47: BDT201 AWS Data Pipeline - AWS re: Invent 2012

On AWS On Premise

High

Frequency

$1/month $2.50/month

Low Frequency $.60/month $1.50/month

Page 48: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 49: BDT201 AWS Data Pipeline - AWS re: Invent 2012
Page 50: BDT201 AWS Data Pipeline - AWS re: Invent 2012

We are sincerely eager to

hear your feedback on this

presentation and on re:Invent.

Please fill out an evaluation

form when you have a

chance.