aws webinar - dynamo db + redshift 13_09_19

37
Designing for Scale Three steps to optimal data performance using DynamoDB and Redshift David Pearson Business Development

Upload: amazon-web-services

Post on 26-Jan-2015

112 views

Category:

Technology


1 download

DESCRIPTION

Learn how Digital Advertising customers are leveraging the integration between Amazon DynamoDB and Amazon Redshift to manage their high scale data, from creation to analysis. In this session, we will describe the three essential ingredients of efficient data flow in the cloud, and introduce a reference architecture that enables customers to meet the demands for low latency and high volume encountered in the Digital Advertising industry. Using existing SQL-based tools and business intelligence systems, you will learn how to gain deeper insight from your data at lower cost. The design principles presented here will be useful to every environment where managing data at scale is a challenge.

TRANSCRIPT

Page 1: AWS Webinar - Dynamo DB + Redshift 13_09_19

Designing for Scale Three steps to optimal data performance

using DynamoDB and Redshift

David Pearson Business Development

Page 2: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon RDS

Amazon DynamoDB Amazon Redshift

Amazon ElastiCache

Compute Storage

AWS Global Infrastructure

Database

Application Services

Deployment & Administration

Networking

AWS Database

Services

Scalable High Performance

Application Storage in the Cloud

Page 3: AWS Webinar - Dynamo DB + Redshift 13_09_19

provision

manage

scale

EFFORT

differentiated?

Page 4: AWS Webinar - Dynamo DB + Redshift 13_09_19

Introduction to AWS Big Data Services

Redshift DynamoDB

Elastic MapReduce Amazon S3

Object Storage

Batch Processing

Real-Time Transactions

Online Analysis and Reporting

Page 5: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon DynamoDB

Page 6: AWS Webinar - Dynamo DB + Redshift 13_09_19

NoSQL Database

Predictable performance

Seamless & massive scalability

Fully managed; zero admin

Amazon DynamoDB

Page 7: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon’s Path to DynamoDB

RDBMS DynamoDB

Page 8: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon DynamoDB

DEVS

OPS

USERS

Page 9: AWS Webinar - Dynamo DB + Redshift 13_09_19

Fast Application Development

Time to Build New Applications

• Flexible data models • Simple API • High-scale queries • Laptop development

Amazon DynamoDB

DEVS

OPS

USERS

Page 10: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon DynamoDB

DEVS

OPS

USERS

Admin-Free (at any scale)

Page 11: AWS Webinar - Dynamo DB + Redshift 13_09_19

request-based capacity provisioning model

Provisioned Throughput

Throughput is declared and updated via the API or the console

CreateTable (foo, reads/sec = 100, writes/sec = 150)

UpdateTable (foo, reads/sec=10000, writes/sec=4500)

DynamoDB handles the rest

Capacity is reserved and available when needed

Scaling-up triggers repartitioning and reallocation

No impact to performance or availability

Page 12: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon DynamoDB

DEVS

OPS

USERS Durable Low Latency

Page 13: AWS Webinar - Dynamo DB + Redshift 13_09_19

WRITES Replicated continuously to 3 AZ’s

Persisted to disk (custom SSD)

READS Strongly or eventually consistent

No latency trade-off

Page 14: AWS Webinar - Dynamo DB + Redshift 13_09_19

Latest News… DynamoDB Local

• Disconnected development

• Full API support

• Download from http://aws.amazon.com/dynamodb/resources/#testing

Page 15: AWS Webinar - Dynamo DB + Redshift 13_09_19

“Compared to similar products, DynamoDB

provides an amazing feature set, including super

low latencies, (literally) push-button scaling,

automatic data persistence, and seamless

integration with Redshift and other AWS services.”

Peter Bogunovich, RightAction Inc

Page 16: AWS Webinar - Dynamo DB + Redshift 13_09_19

AD SERVING

Page 17: AWS Webinar - Dynamo DB + Redshift 13_09_19

EC2

Profiles Database

ad request

ad url

visitor

Ad Servers

DynamoDB

1. Visitor loads a web page

2. Web page issues a request to ad servers on EC2

3. Query to DynamoDB returns the ad to display

4. Link is returned to visitor

cookie hash=userid range=timestamp

user-profile hash=userid

Page 18: AWS Webinar - Dynamo DB + Redshift 13_09_19

EC2

Profiles Database Ad Servers

DynamoDB

Real-time bidding platform

Bidder DynamoDB

Ads Profiles Queues and Buffer Bid response

20 ms

20 ms 20 ms 40 ms

Request network transit

Response network transit Decision on best ad and bid price based on optimization that needs multiple data look-ups

Contingency time buffer

Bid request

real-time bidding

Page 19: AWS Webinar - Dynamo DB + Redshift 13_09_19

EC2

Profiles Database

ad request

ad url

visitor

Ad Servers

DynamoDB

1. Ad files are downloaded from CloudFront

2. Impressions captured in logs to S3

CloudFront

advertisement

impression logs

Static Repository Files

Amazon S3

Page 20: AWS Webinar - Dynamo DB + Redshift 13_09_19

CloudFront

advertisement

impression logs

Static Repository Files

Amazon S3

Profiles Database

EC2 (MAZ)

ad request

ad url

Ad Servers

DynamoDB Elastic Load Balancing

visitor

Click-through Servers

click through log files

click through requests

Elastic Load Balancing

Page 21: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon Redshift

Page 22: AWS Webinar - Dynamo DB + Redshift 13_09_19

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed; zero admin

Amazon Redshift

Page 23: AWS Webinar - Dynamo DB + Redshift 13_09_19

• Direct-attached storage

• Large data block sizes

• Columnar storage

• Data compression

• Zone maps

Redshift dramatically reduces I/O

Id Age State 123 20 CA 345 25 WA 678 40 FL

Row storage Column storage

Page 24: AWS Webinar - Dynamo DB + Redshift 13_09_19

• Load

• Query

• Resize

• Backup

• Restore

Redshift parallelizes and distributes everything

Compute Node 16TB

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients / BI Tools

Amazon S3

Client VPC

Compute Node 16TB

Compute Node 16TB

Leader Node

Page 25: AWS Webinar - Dynamo DB + Redshift 13_09_19

Start small and grow big Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

note: nodes not to scale

Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

Page 26: AWS Webinar - Dynamo DB + Redshift 13_09_19

Monitor query performance

Page 27: AWS Webinar - Dynamo DB + Redshift 13_09_19

View explain plans

Page 28: AWS Webinar - Dynamo DB + Redshift 13_09_19

Redshift works with existing BI tools

JDBC/ODBC

Amazon Redshift

More coming soon…

Page 29: AWS Webinar - Dynamo DB + Redshift 13_09_19

Redshift is Priced to Analyze All Your Data

$0.85 per hour for on-demand (2TB)

$999 per TB per year (3-yr reservation)

Page 30: AWS Webinar - Dynamo DB + Redshift 13_09_19

“Amazon Redshift introduces a major

opportunity to improve the performance of

our real-time reporting, allowing us to run

queries up to 50 times faster than our current

OLAP solution.” – Niek Sanders, VP Engineering

Realized a 20x – 40x

reduction in query times

“Redshift is the

real deal”

Page 31: AWS Webinar - Dynamo DB + Redshift 13_09_19

Analysis

Page 32: AWS Webinar - Dynamo DB + Redshift 13_09_19

CloudFront

advertisement

impression logs

Static Repository Files

Amazon S3

Profiles Database

EC2 (MAZ)

ad request

ad url

Ad Servers

DynamoDB Elastic Load Balancing

visitor

Amazon Redshift

bid history user history

ETL Click-through Servers

click through log files

click through requests

Elastic Load Balancing

Amazon EMR

updated profiles

impressions

new requests user history

Page 33: AWS Webinar - Dynamo DB + Redshift 13_09_19

Amazon Redshift

Drive qualified users to advertiser’s sites

• Ad server logs • 3rd party data

• Bid history • User history

Bid Optimization

Optimizing with Redshift

Optimize return on advertising expenditure

• Impressions • 3rd party data

• User history

• Enrichment

Cost Optimization

Page 34: AWS Webinar - Dynamo DB + Redshift 13_09_19

1. Describe the full lifecycle of data Identify data consumption patterns, expected data volumes and

SLAs (latency, availability, durability) at each point on the timeline

2. Leverage specialized options

DynamoDB – real-time transaction processing

Redshift – online reporting and analysis

EMR – enrichment

S3 – data staging

Three steps to optimal data performance

Page 35: AWS Webinar - Dynamo DB + Redshift 13_09_19

3. Optimize access patterns Design database schemas for maximum efficiency

DynamoDB

» minimize payloads

» separate hot data from cold

Redshift

» good distribution and sort key selection – test as needed

» efficient ingestion (from DynamoDB and S3)

Three steps to optimal data performance

Page 36: AWS Webinar - Dynamo DB + Redshift 13_09_19

DynamoDB • Best Practices, How-Tos, and Tools

• http://aws.amazon.com/dynamodb/resources/

• Download DynamoDB Local • http://aws.amazon.com/dynamodb/resources/#testing

Redshift • Best practices for loading data

• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html

• Best practices for designing tables • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-

practices.html

Resources

Page 37: AWS Webinar - Dynamo DB + Redshift 13_09_19

Questions