introducing the amazon redshift data warehouse - getting deep with amazon redshift data warehouse...

Post on 14-Jan-2015

2.903 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. In this session we'll give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.

TRANSCRIPT

Rahul Pathak

Amazon Redshift

Senior Product Manager

@rahulpathak

#redshift

Data warehousing done the AWS way

• No upfront costs, pay as you go

• Really fast performance at a really low price

• Open and flexible with support for popular tools

• Easy to provision and scale up massively

We set out to build…

A fast and powerful, petabyte-scale data warehouse that is:

Delivered as a managed service

A Lot Faster

A Lot Cheaper

A Lot Simpler

Amazon Redshift

We’re off to a good start

Amazon Redshift dramatically reduces I/O

ID Age State

123 20 CA

345 25 WA

678 40 FL

Row storage Column storage

Scan Direction

Amazon Redshift automatically compresses your data

• Compress saves space and reduces disk I/O

• COPY automatically analyzes and compresses

your data

– Samples data; selects best compression encoding

– Supports: byte dictionary, delta, mostly n, run

length, text

• Customers see 4-8x space savings with real data

– 20x and higher possible based on data set

• ANALYZE COMPRESSION to see details

analyze compression listing;

Table | Column | Encoding

---------+----------------+----------

listing | listid | delta

listing | sellerid | delta32k

listing | eventid | delta32k

listing | dateid | bytedict

listing | numtickets | bytedict

listing | priceperticket | delta32k

listing | totalprice | mostly32

listing | listtime | raw

Amazon Redshift architecture

• Leader Node – SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes – Local, columnar storage

– Execute queries in parallel

– Load, backup, restore via Amazon S3

– Parallel load from Amazon DynamoDB

• Single node version available

10 GigE (HPC)

Ingestion Backup Restore

JDBC/ODBC

Amazon Redshift runs on optimized hardware

HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate

HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage

• Optimized for I/O intensive workloads

• High disk density

• Runs in HPC - fast network

• HS1.8XL available on Amazon EC2

Amazon Redshift parallelizes and distributes everything

• Query

• Load

• Backup

• Restore

• Resize

10 GigE (HPC)

Ingestion Backup Restore

JDBC/ODBC

Amazon Redshift lets you start small and grow big

Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores

Single Node (2 TB)

Cluster 2-32 Nodes (4 TB – 64 TB)

Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

Note: Nodes not to scale

Amazon Redshift is priced to let you analyze all your data

Price Per Hour for HS1.XL Single Node

Effective Hourly Price Per TB

Effective Annual Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year Reservation $ 0.500 $ 0.250 $ 2,190

3 Year Reservation $ 0.228 $ 0.114 $ 999

Simple Pricing

Number of Nodes x Cost per Hour

No charge for Leader Node

No upfront costs

Pay as you go

Amazon Redshift is easy to use

• Provision in minutes

• Monitor query performance

• Point and click resize

• Built in security

• Automatic backups

Provision a data warehouse in minutes

Monitor query performance

Point and click resize

Resize your cluster while remaining online

• New target provisioned in the background

• Only charged for source cluster

Resize your cluster while remaining online

• Fully automated

– Data automatically redistributed

• Read only mode during resize

• Parallel node-to-node data copy

• Automatic DNS-based endpoint cutover

• Only charged for one cluster

Amazon Redshift has security built-in

• SSL to secure data in transit

• Encryption to secure data at rest

– AES-256; hardware accelerated

– All blocks on disks and in Amazon S3

encrypted

• No direct access to compute nodes

• Amazon VPC support

10 GigE (HPC)

Ingestion Backup Restore

Customer VPC

Internal VPC

JDBC/ODBC

Amazon Redshift continuously backs up your data and

recovers from failures

• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of

data at all times

• Backups to Amazon S3 are continuous, automatic, and incremental

– Designed for eleven nines of durability

• Continuous monitoring and automated recovery from failures of drives and nodes

• Able to restore snapshots to any Availability Zone within a region

Amazon Redshift integrates with multiple data sources

Amazon

DynamoDB

Amazon Elastic

MapReduce

Amazon Simple

Storage Service (S3)

Amazon Elastic Compute Cloud (EC2)

AWS Storage Gateway Service

Corporate Data Center

Amazon Relational

Database Service

(RDS)

Amazon Redshift

More coming soon…

Amazon Redshift provides multiple data loading options

• Upload to Amazon S3

• AWS Import/Export

• AWS Direct Connect

• Work with a partner

Data Integration

Systems Integrators

More coming soon…

Amazon Redshift works with your existing analysis tools

JDBC/ODBC

Amazon Redshift

More coming soon…

Accordant Media

Customer Use Case

Resources & Questions

• Rahul Pathak | rapathak@amazon.com | @rahulpathak

• http://aws.amazon.com/redshift

• https://aws.amazon.com/marketplace/redshift/

• https://www.jaspersoft.com/webinar-AWS-Agile-Reporting-and-Analytics-in-the-Cloud

top related