(bdt306) mission-critical stream processing with amazon emr and amazon kinesis | aws re:invent 2014

40

Upload: amazon-web-services

Post on 02-Jul-2015

851 views

Category:

Technology


1 download

DESCRIPTION

Organizations processing mission critical high-volume data must be able to achieve high levels of throughput and durability in data processing workflows. In this session, we will learn how DataXu is using Amazon Kinesis, Amazon S3, and Amazon EMR for its patented approach to programmatic marketing. Every second, the DataXu Marketing Cloud processes over 1 Million ad requests and makes more than 40 billion decisions to select and bid on ad impressions that are most likely to convert. In addition to addressing the scalability and availability of the platform, we will explore Amazon Kinesis producer and consumer applications that support high levels of scalability and durability in mission-critical record processing.

TRANSCRIPT

Page 1: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 2: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Amazon

Redshift

Amazon EMR

Amazon

EC2

Analyze

Amazon

Glacier

Amazon S3

Amazon

DynamoDB

Store

AWS Import/Export

AWS Direct Connect

Collect

Amazon Kinesis

Page 3: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Big data

•Hourly server logs: were your systems misbehaving 1hr ago

•Weekly / Monthly Bill:

what you spent this billing cycle

•Daily customer-preferences report from your web

site’s click stream:

what deal or ad to try next time

•Daily fraud reports:

was there fraud yesterday

what went wrong now

:

prevent overspending now

what to offer the current customer now

block fraudulent use now

Page 4: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

HTTP Post

AWS SDK

LOG4J

Flume

Fluentd

Get* APIs

Kinesis Client

Library

+

Connector Library

Apache

Storm

Amazon Elastic

MapReduce

Sending Reading

Page 5: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 6: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 7: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 8: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 9: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 10: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 11: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 12: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 13: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

DataXu

Page 14: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 15: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

DataXu Records

tx_id: "AFTfN0uAWZ"

exchange: “APPNEXUS"

request_id:"bb656107-3bf7-47a7-8548-8229563e9dc9”

….

adslot: {slot_id: "2686449714718898993”, uuid: "9d2403f1-fc6c-4d38-b6b1-

839fe4b42455”, price_micro_cpm: 661385, currency: "USD”, seat_id: "12-914”,

campaign_id: "C0513n7”, creative_id: “R53a537”}

time_stamp: 1415393474434

serviced_by_host: "cr02.us-east-01”

Confirmation Record

[- 69.120.26.172 - - [08/Nov/2014:21:59:54 -0500] "GET

/rs?id=fc6f2106175a43df8ae4f3b7e6fa8c37&t=marketing&cbust=14155020001916

62 HTTP/1.1" 302 - "http://ads-

by.madadsmedia.com/tags/25628/10217/iframe/728x90.html" "Mozilla/5.0

(compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)" "wfivefivec=c876d00e-

1831-4eba-b78d-cd99188e951a" "OWW=-"

Fraud Record

Page 16: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Continuous

Processing

CDN

Real-time

Bidding

Retargeting

Platform

Reporting

Qubole

Real Time

AppsKCL Apps

Archiver

Amazon Kinesis Event ReplayAmazon S3

Producers AggregatorContinuous

ProcessingStorage Analytics

Redshift

Page 17: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 18: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 19: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 21: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 22: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

https://github.com/awslabs/kinesis-log4j-appender

Page 23: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 24: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Amazon Kinesis storage is replicated across

Availability Zones

Amazon Web Services

AZ AZ AZ

Durable, highly consistent storage replicates dataacross three data centers (availability zones)

Aggregate andarchive to S3

Millions ofsources producing100s of terabytes

per hour

FrontEnd

AuthenticationAuthorization

Ordered streamof events supportsmultiple readers

Real-timedashboardsand alarms

Machine learningalgorithms or

sliding windowanalytics

Aggregate analysisin Hadoop or adata warehouse

Inexpensive: $0.028 per million puts

Page 25: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 26: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

0

200000

400000

600000

800000

1000000

1200000

0 100 200 300 400 500 600 700 800 900 1000 1100

1K

B M

essages/s

ec

Shards

Page 28: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 29: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014
Page 30: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Amazon Kinesis

1417182123

Shard-i

235810

Shard

ID

Lock Seq

num

Shard-i

Host A

Host B

Shard ID Last Archived

Shard-i

0

10

18X2

3

5

8

10

14

17

18

21

23

0

310

Host AHost B

{Event 10, …}

1023

14

17

1821

23

Page 32: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 33: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

CDN

Real Time

Bidding

Retargetin

g

Platform

Reporting

Qubole

Real Time

AppsKCL Apps

Archiver

Kinesis Event ReplayS3

Page 34: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Producers AggregatorContinuous

ProcessingStorage Analytics

CDN

Real-time

Bidding

Retargeting

Platform

Reporting

Qubole

Real Time

AppsKCL Apps

Archiver

Amazon Kinesis Event ReplayAmazon S3

Amazon

Redshift

Page 35: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Producers AggregatorContinuous

ProcessingStorage Analytics

CDN

Real-time

Bidding

Retargeting

Platform

Reporting

Qubole

Real Time

AppsKCL Apps

Archiver

Amazon Kinesis Event ReplayAmazon S3

Redshift

Page 36: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 37: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

• Unordered processing

– Randomize partition key to distribute events over

many shards and use multiple workers

• Exact order processing

– Control the partition key to ensure events are

grouped onto the same shard and read by the

same worker.

• Need both? Get global sequence number Producer

Get Global

SequenceUnordered

Stream

Campaign Centric

Stream

Fraud Inspection

Stream

Get Event

Metadata

Id event Stream – partition key

1 confirmation Campaign-centric stream - UUID

2 fraudUnordered Stream

Fraud-inspection stream – sessionid

Page 38: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

HTTP

Post

AWS SDK

LOG4J

Flume

Fluentd

Get* APIs

Apache

Storm

Amazon

Elastic

MapReduce

Sending Reading

Amazon EMR

PlaybackAmazon S3

Archiver

Page 39: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Page 40: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals