Download - Introduction to Amazon Redshift
![Page 1: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/1.jpg)
Introduction toAmazon Redshift
May, 2014 / Abdullah Cetin CAVDAR @accavdar
![Page 2: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/2.jpg)
What's Amazon Redshift?Amazon Redshift is a fast and powerful, fully
managed, petabyte-scale data warehouse service inthe cloud
https://aws.amazon.com/redshift/
![Page 3: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/3.jpg)
FeaturesPetabyte scale, massively parallelRelational data warehouseFully managed, zero adminSSD and HDD platforms$999/TB/Year
![Page 4: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/4.jpg)
Architecture
![Page 5: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/5.jpg)
Client ApplicationsIntegrates with various data loading and ETL (Extract, Transform, andLoad) tools and business intelligence (BI) reporting, data mining, andanalytics toolsRedshift is based on industry-standard PostgreSQL, so most existingSQL client applications will work with only minimal changes
![Page 6: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/6.jpg)
ConnectionsRedshift communicates with client applications by using industry-standard PostgreSQL JDBC and ODBC drivers
![Page 7: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/7.jpg)
ClustersA cluster is composed of one or more compute nodesLeader Node coordinates the compute nodes and handles externalcommunication
![Page 8: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/8.jpg)
Leader NodeManage communications with client programs and communicationswith compute nodesStore metadataCoordinate query execution
![Page 9: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/9.jpg)
Compute NodesExecute the compiled code, send intermediate results back to theleader node for final aggregationIt has own dedicated CPU, memory, and attached disk storage, whichare determined by the node type
![Page 10: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/10.jpg)
DatabasesA cluster contains one or more databasesUser data is stored on the compute nodesAmazon Redshift is a Relational Database Management System(RDBMS)Amazon Redshift is optimized for high-performance analysis andreporting of very large datasetsAmazon Redshift is based on PostgreSQL
![Page 11: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/11.jpg)
Redshift reduces I/OColumn storage - read data you needData compression - analyzes and compress your dataZone Map
Keep track of minimum and maximum value for each blockSkip over blocks that don't contain data needed for a given queryMinimize unnecessary I/O
Direct attached storageHardware optimized for high performance data processing
Large data block sizesLarge block sizes to make the most of each read
![Page 12: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/12.jpg)
Redshift runs on optimizedhardware
Optimized for I/O intensive workloadsHigh disk densityRuns in HPC - fast network
![Page 13: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/13.jpg)
Redshift parallelizes anddistributes everything
QueryLoadBackup/RestoreResize
![Page 14: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/14.jpg)
Redshift is easy to useProvision in minutesMonitor query performancePoint and click resizeBuilt in securityAutomatic backups
![Page 15: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/15.jpg)
Redshift has security built-inSSL to secure data in transitEncryption to secure data at rest
AES 256 - hardware acceleratedAll blocks on disk and in Amazon S3 encrypted
No direct access to compute nodesAmazon VPC support
![Page 16: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/16.jpg)
Redshift backs up your dataand recovers from failures
Replication within the cluster and backup to Amazon S3Backup to Amazon S3 are continuous, automatic and incrementalContinuous monitoring and automated recovery from failuresAble to restore snapshots to any Availability Zone
![Page 17: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/17.jpg)
Use Cases
![Page 18: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/18.jpg)
Traditional Enterprise DWReduce costs by extending DW rather than adding HWMigrate completely from existing DW systemsRespond faster to business
![Page 19: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/19.jpg)
Companies with Big DataImprove performance by an order of magnitudeMake more data available for analysisAccess business data via standard reporting tools
![Page 20: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/20.jpg)
SaaS CompaniesAdd analytic functionality to applicationsScale DW capacity as demand growsReduce HW and SW costs by an order of magnitude
![Page 22: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/22.jpg)
Data Architecture
![Page 23: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/23.jpg)
Redshift ImplementationHigh Storage Extra Large (XL) DW NodeETL Activities
Approx. 90 minutes including exports from RDBMS, copying to S3,loading stage tables, loading target tables, vacuuming andanalysing tables
SchemaCompressionRetention
![Page 24: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/24.jpg)
DW Anatomy
![Page 25: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/25.jpg)
Why Redshift works forSkillPages?
Scale - MPPPerformance - Columnar data access and compressionPlatform Integration - S3, DynamoOperational AdvantagesEase of AccessCost
![Page 26: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/26.jpg)
Best PracticesAvoid large number of singleton Data Manipulation Language (DML)statements if possibleUse COPY for uploading large datasetsChoose SORT and DISTRIBUTION keys with careEncode data and time with TIMESTAMP data typeExperiment with WLM (Workload Manager) settings
![Page 28: Introduction to Amazon Redshift](https://reader034.vdocument.in/reader034/viewer/2022051211/554f61f1b4c905c8088b4ad9/html5/thumbnails/28.jpg)
THE ENDby Abdullah Cetin CAVDAR / @accavdar