massive message processing with amazon sqs and amazon dynamodb (arc301) | aws re:invent 2013

ARC301 - Controlling the Flood: Massive Message

Processing with Amazon SQS and Amazon DynamoDB

Ari Dias Neto, Ecosystem Solution Architect

November 14, 2013

Who am I?

• The Mailman from Brazil – Delivering messages around the world!

Returning all the

messages…

How many Mailmen?

How long?

Who am I?

We are going to design and build an application to handle any volume of messages! Right now!

What are we going to do?

Ari Dias Neto – Ecosystem Solutions Architect

Scenario – Super Bowl

Promotion: who is going to win?

Promotion

• We cannot lose any message

• We need to process all the valid messages

• Log all the invalid messages and errors

• Beautiful dashboard at the end

• We must process all the messages during the event!

Requirements

• Subscription based on SMS – Cellphone number is the key

Who is going to be in the front-line?

Scalable!

Reliable!

Simple!

Fully managed!

Amazon Simple Queue Service

Fully Managed

Queue Service

Any volume of data

At any level of throughput

We cannot lose any message

No up-front or

fixed expenses

Architecture – Starting with SQS

We have received all

the messages

Now we need to

process all of it

Architecture – Amazon EC2 Instances

how many

Instances?

Architecture – Multithread application

Reduce

the costs

increase

performance EC2

Instances

Architecture

Threads Workers

But how many instances do we need?

m1.xlarge

01 instance 100k

msgs/minute

10 instances 1M

msgs/minute

10 instances 5M messages

5 minutes

Architecture

Auto Scaling

based on the

number of msgs

in the queue

Architecture

Auto Scaling

Where should

we save all the

messages?

Throughput

Needed

Amazon DynamoDB

DynamoDB

valid-votes

invalid-votes

Two tables…

Architecture

Auto scaling Group

valid-votes

invalid-votes

The Dashboard

Final Architecture

SQS DynamoDB

Auto Scaling Group

Workers

Dashboard

AWS Elastic

Beanstalk Container

Benefits

• Ready for any level of throughput

• SQS

• Ready for any required SLA

• Auto Scaling and EC2

• Low Cost

• Fully managed queue service

• Infrastructure is based on the required SLA

• Infrastructure needed for an small period of time

The challenge!

Process all the

messages from the

in 10 minutes!

Let’s go deep!

Let's code!

Each thread

DynamoDB

Connect to SQS queue

Read up to 10 msgs

Validate each

message

Save as valid or invalid

Set “read” in

the queue

Each thread

DynamoDB

Connect to SQS queue

Read up to 10 msgs

Validate each

message

Save as valid or invalid

Set “read” in

the queue

Steps to deploy it on AWS

Create the queue. Queue name: votes

Upload application to S3: s3-sa-east-1.amazonaws.com/arineto/processor.jar

Create launch configuration

Create AMI with JRE. Image ID: ami-05355a6c

Create Auto Scaling Group

Create alarms

Launch it!

✔ ✔ ✔

Create bootstrap script: userdata.txt

The Company

• BigData Corp. was founded to help companies

solve the challenges associated with big data,

from collection to processing to information and

knowledge extraction.

The Challenge

• “How many e-commerce websites exist in your

continent? Can we monitor them on a consistent

basis?” – Build a crawling process that can answer this question in a cost

effective and speedy manner.

Architecture

• Spot Instances + SQS + S3 = Magic – Spot Instances allow us to optimize processing costs

– Amazon SQS allows us to orchestrate the process in a

distributed and asynchronous manner

– Amazon Simple Storage Service (S3) facilitates the storage of

intermediate and final processing results

Main Workers

Execute

crawling and

process data

Maestro

(reserved

instance)

List of crawl

Spot Instances

Secondary Workers

(queue listeners)

Reprocess

data, query

additional

services, store

data on

MongoDB

Spot Instances

Secondary

work queues –

processed data

MongoDB

cluster

Command and

Control Queue

Architecture

Architecture (3)

• Message Volumes – Processing starts by uploading 10MM+ messages

– Each processed message may generate up to 10 new

intermediate messages

– Peak processing of 70K messages / second

• Command & Control Queue – This queue enables us to adjust processing as we go and

request status checks from instances

Results (1)

$100,000.00

$200,000.00

$300,000.00

$400,000.00

$500,000.00

$600,000.00

$700,000.00

$800,000.00

$900,000.00

0 1 2 3 4 5 6 7 8 9 10 11 12

Estimated cost without AWS Cost with AWS

Results(2)

2+ PB of data processed

40+ Bi web pages visited and parsed

500+ services and technologies mapped

A complete new view of the web market

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

ARC301

massive message processing with amazon sqs and amazon dynamodb (arc301) | aws re:invent 2013

Technology

aws re:invent 2016: deep dive on amazon dynamodb (dat304)

amazon web services: dynamodb 101 with itoc australia - what...

deep dive: amazon dynamodb - amazon web...

vmware vrealize operations management pack for amazon...

data modeling and best practices on amazon dynamodb

tvs for vrops - overview presentation - amazon dynamodb

comparing the use of amazon dynamodb and apache hbase for

vmware vrops management pack for amazon dynamodb

priyanka konduru sriharshitha somaraju amazon dynamodb...

build your web analytics with node.js, amazon dynamodb and...

smugmug: from mysql to amazon dynamodb (dat204) | aws...

data & analytics - session 3 - under the covers with amazon...

scale your app for the holidays with amazon dynamodb

amazon elasticache backed by dynamodb -...

amazon dynamodb - api reference · amazon dynamodb api...

gam301 real-time game analytics with amazon redshift, amazon...

dynamodb and amazon cloudsearch

design patterns using amazon dynamodb

build high-scale applications with amazon...

introduction to amazon dynamodb · © 2020, amazon web...