a peek at data storage services in the public cloud€¦ · oracle sql server aurora mysql maria db...
TRANSCRIPT
A Peek at Data Storage Services in the Public Cloud
06-22-2018
What I do?● Project Engineer at Pythian● Cloud Architect● DevOps Engineer● Database Administrator - SQL
Server● Database Administrator - MySQL
Certifications• AWS Certified DevOps Engineer - Professional• AWS Certified Solutions Architect - Professional• Google Cloud Certified Professional Cloud Architect• Microsoft Certified Solutions Expert: Data
Management and Analytics• AWS Certified SysOps Administrator - Associate• AWS Certified Developer - Associate• AWS Certified Solutions Architect - Associate• MS Certified IT Professional - SQL Server 2008• Oracle Database 11g Administrator Certified
Associate
SANDEEP ARORA
ABOUT THE PRESENTER
© 2016 Pythian. Confidential 2
Amazon RDSvs
GCP CloudSQLvs
Azure Database Services
DATA STORAGE What data storage options do public cloud vendors have to offer?
DATA STORAGE OPTIONSWhat are the different types of Data Storage Options available?
Block Storage Object Store RDBMS NoSQL Big Data Data Warehouse
What is it? Data stored on a filesystem
Computer storage system that organizes data into containers called objects
Relationaldatabase –tables, rows and columns
Wide variety of data models, key-value, document, columnar and graph formats
Large volume of data – both structured and unstructured
Integrated data from multiple heterogeneous sources for analytical reporting
Capacity Terabytes + Petabytes + Gigabytes + Terabytes + Petabytes + Petabytes +
Read List, view Have to copy to local disk
SELECT row Filter objects on property
Scan rows SELECT rows
Write Create file, paste
One file INSERT row Put object Put row Batch/stream
Update Granularity
A file An object (a file) Field Attribute Row Field
Usage Install OS and applications
Store blobs Normalized & ACID compliant data.
Denormalized schema
Analysis and insights
Reporting and data analytics
I have put a flow diagram together for choosing a storage service.
HOW TO PICK YOUR DATA STORAGE SERVICE?
DATA STORAGE SERVICES ON CLOUDLet’s identify the services AWS, GCP and Microsoft Azure have to offer.
DATA STORAGE OPTIONS
AWS GCP Microsoft Azure
Block Storage EBS Persistent Disks Azure Managed Disks
Object Store Simple Storage Service (S3)
Google Cloud Storage Blob Storage
RDBMS RDS CloudSQL & Spanner
Azure SQL Database Service & Azure Database Services
NoSQL DynamoDB DataStore Cosmos DB
Data Warehouse Amazon Redshift BigQuery Azure SQL Data Warehouse
Big Data Amazon EMR BigTable HDInsight, Data Lake
• All of these are fully managed Platform-as-a-Service offerings.
• GCP offers access to innovative resources not available anywhere else (big data, machine learning).
• Google’s big data capabilities are by far the best in comparison to any other vendor.
• If you are looking for a globally scalable and highly available RDBMS solution, then read about Spanner. This is one of a kind and I don’t know anyone else apart from Google who can implement something like this.
• What is the problem with using Google?
• If you are looking to use RDBMS DBaaS offering on the cloud, then Amazon stands out by far and any other vendor has lot of catching up to do. With support for more than six platforms, AWS RDS totally stands out.
• If your application supports SQL Server and you are looking for a managed service, then SQL Azure best serves the purpose. GCP currently do not support SQL Server as a service. AWS, however, does support it but being a Microsoft product, MS does a better job at supporting SQL Server as a service.
This is my individual feedback based on my usage of these data storage services.
WHICH VENDOR\SERVICE DO I ENDORSE?
8© 2016 Pythian. Confidential
COMPARING ENGINES&
KEEPING SCORE
CLOUD SERVICE
RDS GCP CLOUDSQL
SQL AZURE
SCORE 0 0 0
AWS clearly supports most of the available database engine that are out there.
WHAT DATABASE ENGINE ARE SUPPORTED?
CLOUD VENDOR
Oracle SQL Server Amazon Aurora**
MySQL Maria DB PostgreSQL
Amazon RDS
GCP CloudSQL
Azure Database Services
** Amazon Aurora • Relational database engine that combines the speed and reliability of high-end commercial databases with the
simplicity and cost-effectiveness of open source databases.• Designed to offer greater than 99.99% availability.• Compatible with MySQL and with PostgreSQL.• Provides 5X the throughput of standard MySQL and 3X the throughput of standard PostgreSQL running on the
same hardware.
COMPARING PRICING &
KEEPING SCORE
CLOUD SERVICE
RDS GCP CLOUDSQL
SQL AZURE
SCORE 1 0 0
Which vendor service costs the most?
CLOUD DATABASE SERVICES PRICING TIERS
TYPE OF INSTANCE
BASICSTANDARDPREMIUM
MEMORY OPTIMIZEDCPU OPTIMIZED
PURCHASE SCHEME
ON-DEMANDOR
RESERVED
DATASTORAGE
& IO OPERATIONS
DATA TRANSFERS
HIGH AVAILABILITY
CONFIGURATION READ REPLICAS
BACKUP STORAGE
LICENSING
Comparison using the same actual production implementations.
CLOUD DATABASE SERVICES PRICING TIERS EXAMPLE
ATTRIBUTES AWS RDS AWS RDS SQL AZURE GCP CLOUD SQL
DATABASE MYSQL MYSQL MYSQL MYSQL
MEMORY 64 GiB 64 GiB 64 GB 60 GB
CPU 16 Cores 16 Cores 16 Cores 16 Cores
STORAGE (SSD) 1024 GB 1024 GB 1024 GB 1024 GB
HIGH AVAILABILITY
Yes Yes Yes Yes + offers Read Replica as well
PURCHASING SCHEME
On-Demand Reserved (1 Year) On-Demand On-Demand
DISCOUNT None 38%-48% None Sustained Use Discount (30%)
BACKUPS 500 GB 6 TB 500 GB 500 GB
MONTHLY COST $2,565 approx. $1,067.33 approx. $3,585 approx. $1,966.64 approx.
Comparison using the same actual production implementations.CLOUD DATABASE SERVICES PRICING TIERS EXAMPLE 2
ATTRIBUTES AWS RDS AWS RDS AWS Aurora AWS Aurora SQL AZURE
DATABASE SQL SERVER ENT.
SQL SERVER ENT.
Aurora Aurora SQL SERVER
MEMORY 64 GiB 64 GiB 122 GiB 122 GiB 64 GB
CPU 16 Cores 16 Cores 16 Cores 16 Cores 16 Cores
STORAGE (SSD) 1024 GB 1024 GB 1024 GB 1024 GB 1024 GB
HIGH AVAILABILITY
Yes Yes Yes Yes Yes
PURCHASING SCHEME
On demand Reserved (1 Year) On demand Reserved (1 Year) On demand
DISCOUNT None 4% None 45%-65% None
BACKUPS 500 GB 6 TB 500 GB 6 TB 500 GB
MONTHLY COST $15239.56 approx. $14,629.33 approx.
$4061 approx. $2,233 approx. $3,059.24 approx.
•AWS is the cheapest vendor with reserved instance pricing.•With on demand pricing, GCP is around 21% cheaper than other cloud vendors.•SQL Server as a managed service is more affordable on Azure than any other cloud vendor.
•GCP offers sustained use discounts.•No reserved pricing currently for Azure and GCP.•License mobility supported for Oracle Databases in RDS.•Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.
Finally, which of the database services is the cheapest?
CLOUD DATABASE SERVICES PRICING TIERS - FINAL
15© 2016 Pythian. Confidential
COMPARING CLOUD MIGRATIONS&
KEEPING SCORE
CLOUD SERVICE
RDS GCP CLOUDSQL
SQL AZURE
SCORE 2 1 0
What are the various tools or options available to migrate to cloud database services?
HOW TO MIGRATE TO DBAAS?
Method Amazon Relational Database Services Azure Database Services
GCP Cloud SQL
Oracle SQL Server
Aurora MySQL Maria DB
PostgreSQL
SQL Server
MySQL PostgreSQL
MySQL PostgreSQL
Import/Export
Native Backup/Restore
*
MigrationService**
Replication ***
* Oracle RMAN backup and restore is not supported but Oracle Data Pump is supported.** AWS Database Migration Service or DMS and Azure Database Service. Both support heterogeneous & homogenous migrations.*** Ongoing replication from Amazon RDS for SQL Server using AWS Database Migration Service.
COMPARING BACKUPS&
KEEPING SCORE
CLOUD SERVICE
RDS GCP CLOUDSQL
SQL AZURE
SCORE 3 1 0.5
What types of backups are supported?
DATABASE BACKUPS ON THE CLOUD
Backup Type RDS SQL AZURE GCP CLOUD SQL
Automatic backups Automatic snapshots Built-in backups Automated backups
Manual backups Manual snapshots DB export On demand backups
• Amazon RDS creates a storage volume snapshot of your DB instance, backing up the entire DB instance and not just individual databases. Snapshots are stored on S3. Backup storage will be charged as per S3 usage. Backups are incremental. Backup retention is b/w 7 to 35 days.
• Basic, standard, and premium databases are backed up automatically in Azure SQL Database. These backups are retained for 7 days, 14 days, and 35 days respectively. Built-in backups come at no additional cost. Long term backup retention is also supported.
• Backups provide a way to restore your Cloud SQL instance to recover lost data or recover from a problem with your instance. Cloud SQL retains up to 7 automated backups for each instance. Cloud SQL backups are incremental; they contain only data that has changed since the previous backup was taken.
19© 2016 Pythian. Confidential
COMPARING RESTORES &
KEEPING SCORECLOUD
SERVICERDS GCP
CLOUDSQLSQL AZURE
SCORE 3 1 1.5
How can we restore our database when disaster strikes?
DATABASE RESTORE ON THE CLOUD
ATTRIBUTES AWS RDS SQL AZURE GCP CLOUD SQL
Point-in-time restore Yes Yes Yes
Native backup file restore Yes (except Oracle RMAN backup)
Yes (except SQL Server –unless you are using a
managed instance)
Yes
Native export file restore Yes Yes Yes
Overwrite existing database No No Yes
Replicas state? Delete before restore Delete before restore Delete before restore
Long-term retention? Yes (35 days) Yes (10 years) No (7 days)
COMPARING SECURITY FEATURES
& KEEPING SCORE
CLOUD SERVICE
RDS GCP CLOUDSQL
SQL AZURE
SCORE 3 1 1.5
What are the various security features that are available?
SECURING YOUR DATABASES IN THE CLOUD?
AWS RDS SQL AZURE GCP CLOUD SQL
Resource management permissions
IAM Azure Active Directory Cloud IAM
Encryption at REST Yes (AES-256 encryption and TDE for Oracle databases)
Yes (TDE and enabled by default for SQL Server)
Yes (AES-256 or symmetric keys)
Encryption in data transit Yes (SSL/TLS) Yes (SSL/TLS) Yes (Cloud SQL Proxy or SSL\TLS)
Auditing Yes Yes Yes
Network firewall Yes Yes Yes
Backup encryption Yes Yes Yes
Security assessments Trusted advisor Vulnerability assessment No
Database authentication and authorization
Yes Yes Yes
COMPARING DESIGN PATTERNS&
KEEPING SCORE CLOUD
SERVICERDS GCP
CLOUDSQLSQL AZURE
SCORE 4 1 2.5
What is the difference between each service when you design for high availability?
DATABASE HIGH AVAILABILITY – DESIGN PATTERN
Things to keep in mind when designing for high availabilityDATABASE HIGH AVAILABILITY EXPLANATION
AWS RDS SQL AZURE GCP CLOUD SQL
• Synchronous replication –highly durable
• Only database engine on primary instance is active and accepts reads and writes.
• Always span two availability zones within a single region
• Database engine version upgrades happen on primary
• Automatic failover to standby when a problem is detected
• The failover replica is billed as a separate instance.
• Synchronous replication - at least two of those databases are synchronous
• The hardware these databases reside on are on completely physically separate subsystems within one datacenter
• Automatic failover to standby when a problem is detected on hardware.
• Zonal configuration (preview)–Used with elastic pools and eligible for Premium tier only at no extra cost.
• The local replica is not billed as a separate instance. The cost is included.
• Synchronous replication.• Automatic failover is triggered
when primary database suffers an outage or is unresponsive.
• The failover replica is billed as a separate instance.
• You can use the failover replica as a read replica, to offload read operations from the master.
CURRENT STANDINGSCLOUD
SERVICERDS GCP
CLOUDSQLSQL AZURE
SCORE 5 2 2.5
How do we horizontally scale our DBaaS databases?DATABASE SCALABILITY – READ REPLICAS
• You will pay for all regional data transfers as per applicable rates.
• AWS RDS doesn’t support Read Replicas for SQL Server and Oracle Database Engine.
• You will have to manually promote the databases in the case of a actual disaster.
• You will also need to update the connection strings.
• Azure Database Services also supports something called failover groups which allow automatic failovers to secondary region in a scenario of a disaster. You will need to switch the connection strings or use a load balancer to take advantage of automatic failover. Both databases have a separate connection endpoint.
• Replication is asynchronous so there is a chance of data loss.
Read Replicas can also serve as disaster recovery setup
READ REPLICAS FOR DISASTER RECOVERY
© 2016 Pythian. Confidential 29
CURRENT STANDINGSCLOUD
SERVICERDS GCP
CLOUDSQLSQL AZURE
SCORE 6 3 4
How can we ensure that our databases are in a private network?
DATABASE DESIGN PATTERN - NETWORKING
CURRENT STANDINGSCLOUD
SERVICERDS GCP
CLOUDSQLSQL AZURE
SCORE 7 3 4
How do we deal with read and write contention?
R\W CONTENTION – 3 TIER ARCHITECTURE - AWS
How do we deal with read and write contention?R\W CONTENTION – 3 TIER ARCHITECTURE - AZURE
34© 2016 Pythian. Confidential
How do we deal with read and write contention?
R\W CONTENTION – 3 TIER ARCHITECTURE - GCP
COMPARING PERFORMANCE &
KEEPING SCORE
CLOUD SERVICE
RDS GCP CLOUDSQL
SQL AZURE
SCORE 8 4 5
Performance results across the same configuration across three platformsPERFORMANCE METRICS
COMPARE MONITORING SETTINGS&
KEEPING SCORECLOUD
SERVICERDS GCP
CLOUDSQLSQL AZURE
SCORE 8 5 5
How can we set up monitoring for our databases on the cloud?
MONITORING DATABASE SERVICES
AWS RDS SQL AZURE GCP CLOUDSQL
Monitoring tool CloudWatch OMS – Log Analytics Stackdriver
Free monitoring
Paid monitoring
Notifications
39© 2016 Pythian. Confidential
FINAL SCORE
CLOUD SERVICE
RDS GCP CLOUDSQL
SQL AZURE
SCORE 9 6 6
THANK YOU“Everyone claims to be working on cloud but only 10% of the people are actually leveraging the cloud. Running a VM on the cloud is not working on the cloud.”
Email: [email protected]: +91-9582073333
© 2016 Pythian. Confidential 41