amazon aurora deep dive - files.meetup.comfiles.meetup.com/19647895/amazon aurora - deep...
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora Deep Dive
Debanjan Saha – GM, Amazon Aurora
Amazon Web Services
June, 2016
MySQL-compatible relational database
Performance and availability of
commercial databases
Simplicity and cost-effectiveness of
open source databases
Delivered as a managed service
What is Amazon Aurora?
Re-imagined for the cloud
Architected for the cloud – e.g. moved the
logging and storage layer into a
multitenant, scale-out database-optimized
storage service
Leverages existing AWS services: Amazon
EC2, Amazon VPC, Amazon DynamoDB,
Amazon SWF, and Amazon S3
Maintain compatibility with MySQL –
customers can migrate their MySQL
applications as-is, use all MySQL tools.
Control PlaneData Plane
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3
Reproducing benchmark results
h t t p s : / / d 0 . a w s s t a t i c . c o m / p r o d u c t - m a rk e t i n g / Au r o r a / R D S_ Au r o r a _ Pe r f o r m a n c e _ As s e s s m e n t _ Be n c hm a r k i n g _ v 1 - 2 . p d f
AMAZON
AURORA
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
• Create an Amazon VPC (or use an existing one).
• Create four EC2 R3.8XL client instances to run the
SysBench client. All four should be in the same AZ.
• Enable enhanced networking on your clients
• Tune your Linux settings (see whitepaper)
• Install Sysbench version 0.5
• Launch a r3.8xlarge Amazon Aurora DB Instance in
the same VPC and AZ as your clients
• Start your benchmark!
1
2
3
4
5
6
7
WRITE PERFORMANCE READ PERFORMANCE
MySQL SysBench results
R3.8XL: 32 cores / 244 GB RAM
5X faster than RDS MySQL 5.6 & 5.7
Five times higher throughput than stock MySQL
based on industry standard benchmarks.
0
25,000
50,000
75,000
100,000
125,000
150,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
Aurora MySQL 5.6 MySQL 5.7
WRITE PERFORMANCE READ PERFORMANCE
Scaling with instance sizes
Aurora scales with instance size for both read and write.
Aurora MySQL 5.6 MySQL 5.7
Beyond benchmarks
If only real world applications saw benchmark performance
POSSIBLE DISTORTIONS
Real world requests contend with each other
Real world metadata rarely fits in data dictionary cache
Real world data rarely fits in buffer cache
Real world production databases need to run with HA enabled
Scaling User Connections
SysBench OLTP Workload
250 tables
Connections Amazon Aurora
RDS MySQL
w/ 30K IOPS
50 40,000 10,000
500 71,000 21,000
5,000 110,000 13,000
8xU P TO
FA S T E R
Scaling Table Count
Tables
Amazon
Aurora
MySQL
I2.8XL
local SSD
MySQL
I2.8XL
RAM disk
RDS MySQL
w/ 30K IOPS
(single AZ)
10 60,000 18,000 22,000 25,000
100 66,000 19,000 24,000 23,000
1,000 64,000 7,000 18,000 8,000
10,000 54,000 4,000 8,000 5,000
SysBench write-only workload
Measuring writes per second
1,000 connections
11xU P TO
FA S T E R
Scaling Data Size
DB Size Amazon Aurora
RDS MySQL
w/ 30K IOPS
1GB 107,000 8,400
10GB 107,000 2,400
100GB 101,000 1,500
1TB 26,000 1,200
21xU P TO
FA S T E R
SYSBENCH WRITE-ONLY
DB Size Amazon Aurora
RDS MySQL
w/ 30K IOPS
80GB 12,582 585
800GB 9,406 69
CLOUDHARMONY TPC-C
136xU P TO
FA S T E R
Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How did we achieve this?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
IO traffic in MySQL
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R I T E
MYSQL WITH REPLICA
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBSAmazon Elastic
Block Store (EBS)
Primary
Instance
Replica
Instance
1
2
3
4
5
Issue write to EBS – EBS issues to mirror, ack when both done
Stage write to standby instance
Issue write to EBS on standby instance
IO FLOW
Steps 1, 3, 5 are sequential and synchronous
This amplifies both latency and jitter
Many types of writes for each user operation
Have to write data blocks twice to avoid torn writes
OBSERVATIONS
780K transactions
7,388K I/Os per million txns (excludes mirroring, standby)
Average 7.4 I/Os per transaction
PERFORMANCE
30 minute SysBench writeonly workload, 100GB dataset, RDS MultiAZ, 30K PIOPS
IO traffic in Aurora
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R I T E
IO FLOW
Only write redo log records; all steps asynchronous
No data block writes (checkpoint, cache replacement)
6X more log writes, but 9X less network traffic
Tolerant of network and storage outlier latency
OBSERVATIONS
27,378K transactions 35X MORE
950K I/Os per 1M txns (6X amplification) 7.7X LESS
PERFORMANCE
Boxcar redo log records – fully ordered by LSN
Shuffle to appropriate segments – partially ordered
Boxcar to storage nodes and issue writesReplica
Instance
Scaling updates with replicas
Updates per
second Amazon Aurora
RDS MySQL
30K IOPS (single AZ)
1,000 2.62 ms 0 s
2,000 3.42 ms 1 s
5,000 3.94 ms 60 s
10,000 5.38 ms 300 s
SysBench Writeonly Workload
250 tables
500xU P TO
L O W E R L A G
“In RDS MySQL, we saw replica lag spike to almost 12 minutes which
is almost absurd from an application’s perspective. The maximum
read replica lag across 4 replicas never exceeded beyond 20 ms.”
Real-life data - read replica latency
IO traffic in Aurora Replicas
PAGE CACHE
UPDATE
Aurora Master
30% Read
70% Write
Aurora Replica
100% New Reads
Shared Multi-AZ Storage
MySQL Master
30% Read
70% Write
MySQL Replica
30% New Reads
70% Write
SINGLE-THREADED
BINLOG APPLY
Data Volume Data Volume
Logical: Ship SQL statements to Replica
Write workload similar on both instances
Independent storage
Can result in data drift between Master and Replica
Physical: Ship redo from Master to Replica
Replica shares storage. No writes performed
Cached pages have redo applied
Advance read view when all commits seen
MYSQL READ SCALING AMAZON AURORA READ SCALING
Storage durability
Storage volume automatically grows up to 64 TB
Quorum system for read/write; latency tolerant
Peer to peer gossip replication to fill in holes
Continuous backup to S3 (built for 11 9s durability)
Continuous monitoring of nodes and disks for repair
10GB segments as unit of repair or hotspot rebalance
Quorum membership changes do not stall writes
AZ 1 AZ 2 AZ 3
Amazon S3
Six copies across three availability zones
4 out 6 write quorum; 3 out of 6 read quorum
Peer-to-peer replication for repairs
Volume striped across hundreds of storage nodes
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read and write availability Read availability
Fault-tolerant storage
Continuous backup
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time
• Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3
• Backup happens continuously without performance or availability impact
• At restore, retrieve the appropriate segment snapshots and log streams to storage nodes
• Apply log streams to segment snapshots in parallel and asynchronously
Survivable caches
We moved the cache out of the
database process
Cache remains warm in the event of
database restart
Lets you resume fully loaded
operations much faster
Instant crash recovery + survivable
cache = quick and easy recovery from
DB failures
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
Caching process is outside the DB process
and remains warm across a database restart
Traditional Databases
Have to replay logs since the last
checkpoint
Typically 5 minutes between checkpoints
Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
Underlying storage replays redo records
on demand as part of a disk read
Parallel, distributed, asynchronous
No replay for startup
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being
applied to each segment on demand, in
parallel, asynchronously
Instant crash recovery
Read replicas are failover targets
Aurora cluster contains primary node
and up to fifteen secondary nodes
Failing database nodes are
automatically detected and replaced
Failing database processes are
automatically detected and recycled
Secondary nodes automatically
promoted on persistent outage, no
single point of failure
Customer application may scale-out
read traffic across secondary nodes
AZ 1 AZ 3AZ 2
Primary
NodePrimary
NodePrimary
Node
Primary
NodePrimary
NodeSecondary
Node
Primary
NodePrimary
NodeSecondary
Node
Customer specifiable fail-over order
Read balancing across read replicas
Faster failover
AppRunningFailure Detection DNS Propagation
Recovery Recovery
DBFailure
MYSQL
App
Running
Failure Detection DNS Propagation
Recovery
DB
Failure
AURORA WITH MARIADB DRIVER
1 5 - 2 0 s e c
3 - 2 0 s e c
“In RDS MySQL, it took minutes or sometimes tens of minutes to failover.
It’s pretty awesome that you can failover/restart within less than a minute.”
Real-life data – fail-over time
ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]
ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN
[DISK index | NODE index] FOR INTERVAL interval
ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type
[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval
Simulate failures using SQL
To cause the failure of a component at the database node:
To simulate the failure of disks:
To simulate the failure of networking:
Well established MySQL ecosystem
Business Intelligence Data Integration Query and Monitoring SI and Consulting
Source: Amazon
“We ran our compatibility test suites against Amazon Aurora and everything
just worked." - Dan Jewett, Vice President of Product Management at Tableau
Just add read-only AWS credentials and select the services you wish to monitor (e.g. RDS)
Monitoring Aurora with Datadog
Monitor the new RDS enhanced metrics for high-resolution system-level metrics
Aurora enhanced metrics
Correlate Aurora metrics with metrics and events from the rest of your infrastructure
Monitoring the whole stack
1. Establish baseline
a. RDS MySQL to Aurora DB
snapshot migration
b. MySQL dump/import
2. Catch-up changes
a. Binlog replication
b. Tungsten replicator
Simplify migration from RDS MySQL
Application Users
MySQL Aurora
Network
Migration from EC2 & on-premise MySQL
Data migration service• Logical data replication from on-premise or EC2
• Code & schema conversion across engines
S3 integration• Load partial datasets directly from / to S3
• Ingest large database snapshots (>2TB)
• Snowball integration• Ingest huge database snapshots (>10TB)
• Send us your data in a suitcase!
Move data to the same or different database engine
Keep your apps running during the migration
Start your first migration in 10 minutes or less
Replicate within, to, or from Amazon EC2 or RDS
AWS Database
Migration Service
Migration non-MySQL databases