rethinking the database for the cloud (ijaws)

30
Rethinking the database for the cloud AWS database services best practices Amazon Data Services Japan Rasmus Ekman

Upload: rasmus-ekman

Post on 26-Jan-2015

112 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Rethinking the database for the cloud (iJAWS)

Rethinking the database for the cloudAWS database services best practices

Amazon Data Services JapanRasmus Ekman

Page 2: Rethinking the database for the cloud (iJAWS)

Traditional architecture

Client

Application

Relational database

Page 3: Rethinking the database for the cloud (iJAWS)

Problems with this approach

Client

Application

Relational database

• It doesn’t scale• Management is hard• High cost• Low performance• Migration is difficult

Page 4: Rethinking the database for the cloud (iJAWS)

Why do we get these problems?When all you have is a hammer, everything looks like a nail

Client

Application

Relational database

Page 5: Rethinking the database for the cloud (iJAWS)

Rethinking the architecture

Client

Application

Data

Search

NoSQL SQL DWH

Cache

Hadoop

BlobStore

ETL

Page 6: Rethinking the database for the cloud (iJAWS)

AWS service and use case mapping

DataSearch NoSQL SQL DWHCache Hadoop

Blob store

ETL

Amazon S3

Amazon EMR

DynamoDB

Amazon RDS

ElastiCache

Amazon Redshift

AWS Data Pipeline

Amazon CloudSearch

Page 7: Rethinking the database for the cloud (iJAWS)

Sample references

Page 8: Rethinking the database for the cloud (iJAWS)

Social gaming

Autoscaling

Elastic Loadbalancer

Mobile client

DynamoDB

Amazon S3

Log files

Amazon Elastic

MapReduce

31

2

Social gaming have a large amount of transactions, which all require high performance and extreme scalability

①   Player data is stored in Amazon DynamoDB, which can scale both in terms of data volume and performance. Long term usage log files are sent in parallel to S3 for unlimited and cheap storage. Big data analytics are done in EMR, which can be easily integrated with both DynamoDB and S3.

1

2

3

Page 9: Rethinking the database for the cloud (iJAWS)

E-commerce site

Autoscaling

End users

RDS(Master)

ElastiCache

4 1

2

High availability, search performance and flexibility to rapidly change data structures to fit new business requirements.①   For high performance, low latency responses, cache in Elasticache first②   Order and customer information stored in a traditional, but fault tolerant RDS.   商 Item meta data, such as color, title etc are all stored in DynamoDB for a very flexible data schema④   For scalable search meta data is indexed into CloudSearch, which can handle full text search easily

1

2

3

RDS(Slave)

Amazon CloudSearc

h

Amazon DynamoDB

3

Page 10: Rethinking the database for the cloud (iJAWS)

How do I know which service to pick?The “data temperature” method

Page 11: Rethinking the database for the cloud (iJAWS)

What is “data temperature”?

Data      ?

http://www.amazon.co.jp/dp/B0016V9FCQ

Page 12: Rethinking the database for the cloud (iJAWS)

Data temperature

Hot Warm Cold

Volume MB ~ GB GB ~ TB PB

Item size B ~ KB KB ~ MB KB ~ TB

Latency ms ms-s min-hr

Durability Low-high High Very high

Request rate Very high High Low

Cost/GB $$~$ $~¢¢ ¢

The temperature of the data will vary depending on its format and use.

Page 13: Rethinking the database for the cloud (iJAWS)

The AWS service heat map

LowData volume

Latency

Cost/GB

Request

Amazon ElastiCach

e

Amazon CloudSearch

Amazon RDS

Amazon DynamoDB

Amazon S3

Amazon RedShift

Amazon EMR

Low

High

High

Low

Low

High

High

Page 14: Rethinking the database for the cloud (iJAWS)

How do I know which service to pick?The cost estimation method

Page 15: Rethinking the database for the cloud (iJAWS)

Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?

• “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…”

Request ratewrites/s

Object sizebytes

Total sizeGB/month

Objects per month

300 2048 1483 777,600,000

Page 16: Rethinking the database for the cloud (iJAWS)

Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?

• Time for …

※ : http://calculator.s3.amazonaws.com/index.html?lng=ja_JP

Page 17: Rethinking the database for the cloud (iJAWS)

Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?

Request rate Object size Total size Objects

300 2048 1483 777,600,000

DynamoDB

Monthly cost : $669.56

Amazon S3

Monthly cost : $4325.33<

Page 18: Rethinking the database for the cloud (iJAWS)

Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?

Request rate Object size Total size Objects

Scenario 1

300 2048 1483 777,600,000

Scenario 2

300 32,768 23,730 777,600,000

DynamoDB win

Amazon S3 win

Page 19: Rethinking the database for the cloud (iJAWS)

Summary

Page 20: Rethinking the database for the cloud (iJAWS)

Summary

• The era of relational database only onpremises architecture is over.

• Performance, reliability, and scalability can all be improved by the cloud, but choosing the right architecture is must.

• There are several ways of choosing the right service for the job– Use the “data temperature” and use case– Use the reverse cost estimate method– Ask AWS sales

Page 21: Rethinking the database for the cloud (iJAWS)

When in doubt, contact us

https://aws.amazon.com/jp/contact-us/

Page 22: Rethinking the database for the cloud (iJAWS)

APPENDIXAWS database services - introduction and best practices

Page 23: Rethinking the database for the cloud (iJAWS)

Amazon RDSA fully managed relational database service

• Create and scale with a few clicks

• Automated backups every 5 minutes for DR

• Manual snapshot feature

Availability Zone A Availability Zone B

Master SlaveData synch

Automatic failoverAutomated

backup

• Automated security patching

• 4 supported engines• Monitoring and

automatic recovery

Page 24: Rethinking the database for the cloud (iJAWS)

Amazon RDSA fully managed relational database service

When to use• Transactions• Complex queries• Medium to high query/write

rate– Up to 30 K IOPS (15 K reads +

15K writes)

• 100s of GB to low TBs• Workload can fit in a single

node• High durability

and not to use• Massive read/write rates

– Example: 150 K write requests per second

• Data size or throughput demands

• sharding– Example: 10 s or 100 s of

terabytes

• Simple Get/Put and queries that a NoSQL can handle

• Complex analytics

Page 25: Rethinking the database for the cloud (iJAWS)

DynamoDBFully managed NoSQL service• Easy administration and

high availability– No SPOF– Data is replicated into 3

availability zones– Storage scales, and data is

automatically partioned

• No limit on storage– Only pay for the storage you

use– No need to add nodes or

disks as storage grows Client

Region

Page 26: Rethinking the database for the cloud (iJAWS)

DynamoDBFully managed NoSQL service

• Fast and predictable performance

• Seamless/massive scale• Autosharding• Consistent/low latency• No size or throughput limits• Very high durability• Key-value or simple queries

• Need multi-item/row or cross table transactions

• Need complex queries, joins

• Need real-time analytics on historic data

• Storing cold data

When to use and not to use

Page 27: Rethinking the database for the cloud (iJAWS)

Amazon RedshiftFully managed data warehouse service• DWH as a Service: Amazon Redshift

is a fast, fully managed, petabyte-scale data warehouse service

• Scalable: 160GB ~ Petabytes

• Fast: Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to take advantage of all available resources.

• Low cost: No initial cost, no license fees, and only pay for what you use.

+nodes

BI tools

リーダーノード

Compute

node

Compute

node

Compute

node

JDBC/ODBC

10GigE Mesh

SQL end point:• Parallel queries• Create results

S3, DynamoDB, EMR integration

Page 28: Rethinking the database for the cloud (iJAWS)

Amazon RedshiftFully managed data warehouse service

• Information analysis and reporting• Complex DW queries that

summarize historical data• Batched large updates e.g. daily

sales totals• 10s of concurrent queries• 100s GB to PB• Compression• Column based• Very high durability

• OLTP workloads– 1000s of concurrent

users– Large number of

singleton updates

When to use and not to use

Page 29: Rethinking the database for the cloud (iJAWS)

Amazon S3low cost, highly reliable object storage service

Datacenter A

Datacenter C

Datacenter B

File A

File B

File C

User side Infrastructure side• Never lose data with

99.99999999999% reliability• Data automatically replicated• Choose from over 9 regions

globally

• Only put data, with no need to worry about scalability, infrastructure, volume expansion etc.

• Only pay for what you useExample : 1GB/Month – ~3yen

Page 30: Rethinking the database for the cloud (iJAWS)

Amazon S3low cost, highly reliable object storage service

• Store large objects• Key-value store - Get/Put/List• Unlimited storage• Versioning• Very high durability

– 99.999999999%

• Very high throughput (via parallel clients)

• Use for storing persistent data– Backups– Source/target for EMR– Blob store with metadata in SQL or NoSQL

• Complex queries• Very low latency (ms)• Search• Read-after-write

consistency for overwrites

• Need transactions

When to use and not to use