amazon rds for postgresql: what's new and lessons learned - ny 2017
TRANSCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grant McAlister – Senior Principal Engineer - RDS
March 2017
Amazon RDS for PostgreSQLWhat’s New and Lessons Learned
Amazon Aurora with PostgreSQL Compatibility
• PostgreSQL 9.6+• Cloud Optimized• Log based• 6 Copies across 3 Availability Zones• Up to 15 Read Replicas • Faster Failover• Enhanced Scaling• Autoscaling of storage to 64TB
Logging + Storage
SQL
Transactions
Caching
Amazon S3
PREVIEW
RDS Version Updates
New Major Version – 9.6
New Minor Releases (soon)• 9.6.2• 9.5.6 • 9.4.11 • 9.3.16
Extension Support Additions
9.6.1 bloom & pg_visibility9.6.2 log_fdw, pg_hint_plan & pg_freespacemap
9.3 Original - 32
9.3 Current - 35
9.4 Current - 399.5 Current - 44
Future - ???9.6 Current - 49
log_fdw set log_destination to csvlog
postgres=> create extension log_fdw;
postgres=> CREATE SERVER log_fdw_server FOREIGN DATA WRAPPER log_fdw;
postgres=> select * from list_postgres_log_files(); file_name | file_size_bytes----------------------------------+----------------- postgresql.log.2017-03-28-17.csv | 2068 postgres.log | 617
postgres=> select create_foreign_table_for_log_file('pg_csv_log','log_fdw_server','postgresql.log.2017-03-28-17.csv');
postgres=> select log_time, message from pg_csv_log where message like 'connection%'; log_time | message----------------------------+-------------------------------------------------------------------------------- 2017-03-28 17:50:01.862+00 | connection received: host=ec2-54-174-205.compute-1.amazonaws.com port=45626 2017-03-28 17:50:01.868+00 | connection authorized: user=mike database=postgres
log_fdw - continued
can be done without csvpostgres=> select create_foreign_table_for_log_file('pg_log','log_fdw_server','postgresql.log.2017-03-28-17');
postgres=> select log_entry from pg_log where log_entry like '%connection%';
log_entry----------------------------------------------------------------------------------------------------------------------------------------------------2017-03-28 17:50:01 UTC:ec2-54-174.compute-1.amazonaws.com(45626):[unknown]@[unknown]:[20434]:LOG: received: host=ec2-54-174-205..amazonaws.com 2017-03-28 17:50:01 UTC:ec2-54-174.compute-1.amazonaws.com(45626):mike@postgres:[20434]:LOG: connection authorized: user=mike database=postgres2017-03-28 17:57:44 UTC:ec2-54-174.compute-1.amazonaws.com(45626):mike@postgres:[20434]:ERROR: column "connection" does not exist at character 143
pg_hint_plan
Add to shared_preload_libraries
• pg_hint_plan.debug_print• pg_hint_plan.enable_hint• pg_hint_plan.enable_hint_table• pg_hint_plan.message_level• pg_hint_plan.parse_messages
pg_hint_plan - examplepostgres=> EXPLAIN SELECT * FROM pgbench_branches bpostgres-> JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid; QUERY PLAN------------------------------------------------------------------------------------------- Sort (cost=15943073.17..15993073.17 rows=20000000 width=465) Sort Key: a.aid -> Hash Join (cost=5.50..802874.50 rows=20000000 width=465) Hash Cond: (a.bid = b.bid) -> Seq Scan on pgbench_accounts a (cost=0.00..527869.00 rows=20000000 width=97) -> Hash (cost=3.00..3.00 rows=200 width=364) -> Seq Scan on pgbench_branches b (cost=0.00..3.00 rows=200 width=364)
postgres=> /*+ NestLoop(a b) */postgres-> EXPLAIN SELECT * FROM pgbench_branches bpostgres-> JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid; QUERY PLAN------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.58..44297240.44 rows=20000000 width=465) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts a (cost=0.44..847232.44 rows=20000000 width=97) -> Index Scan using pgbench_branches_pkey on pgbench_branches b (cost=0.14..2.16 rows=1 width=364) Index Cond: (bid = a.bid)
Major version upgrade
Prod9.5
Prod9.6
pg_upgrade
Backup Backup
No PITR
Test9.5
Test9.6
pg_upgrade
Restore to a test instance
Application Testing
Security
Forcing SSL on all connections
DB Instance
SnapshotApplication
HostSSL
Log Backups
Security Group
Forcing SSL on all connections
DB Instance
SnapshotApplication
HostSSL
Log Backups
Security Group
VPC
Forcing SSL on all connections
DB Instance
SnapshotApplication
HostSSL
Log Backups
Security Group
VPC
Encryption at Rest
Forcing SSL on all connections
DB Instance
SnapshotApplication
HostSSL
Log Backups
Security Group
VPC
Encryption at Rest
ssl_mode=disable
Forcing SSL on all connections
DB Instance
SnapshotApplication
HostSSL
Log Backups
Security Group
VPC
Encryption at Rest
ssl_mode=disable
rds.force_ssl=1 (default 0)
Unencrypted Snapshot Sharing
DB Instance
Snapshot
Prod Account
Test Account
Snapshot
Share with account
Unencrypted Snapshot Sharing
DB Instance
Snapshot
Prod Account
Test Account
Snapshot
Snapshot
Share with account
Unencrypted Snapshot Sharing
DB Instance
Snapshot
Prod Account
Test Account
SnapshotDB Instance
Snapshot
Share with account
Share to Public
Encrypted Snapshot Sharing
DB Instance
Snapshot
Prod Account
Test Account
Snapshot
Share with account
Encryption at Rest
Default
Encrypted Snapshot Sharing
DB Instance
Snapshot
Prod Account
Test Account
Snapshot
Share with account
Encryption at Rest
CustomKey
Add external account
Encrypted Snapshot Sharing
DB Instance
Snapshot
Prod Account
Test Account
SnapshotDB Instance
Snapshot
Share with account
Encryption at Rest
CustomKey
Add external account
Cross Region Replicas – Encrypted
SecondarySync
Application
Primary
AZ1 AZ2
Application
US-EAST-1
Cross Region Replicas – Encrypted
SecondarySync
Application
Primary
AZ1 AZ2
Application
Read Replica
Application
AZ1Async Replication
US-EAST-1 EU-WEST-1
HIPAA-eligible service & FedRAMP
• RDS PostgreSQL is now a HIPAA-eligible service• https://aws.amazon.com/compliance/hipaa-compliance/
• FedRAMP in AWS GovCloud (US) region • https://aws.amazon.com/compliance/fedramp/
Data movement
Move data to the same or different database engine Keep your apps running during the migrationStart your first migration in 10 minutes or lessReplicate within, to, or from AWS EC2 or RDS
AWSDatabase Migration
Service(DMS)
CustomerPremises
Application Users
EC2or
RDS
Internet
VPN
Start a replication instance
Keep your apps running during the migration
AWS Database Migration Service
CustomerPremises
Application Users
EC2or
RDS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Keep your apps running during the migration
AWS Database Migration Service
CustomerPremises
Application Users
EC2or
RDS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let the AWS Database Migration Service create tables and load data
Keep your apps running during the migration
AWS Database Migration Service
CustomerPremises
Application Users
EC2or
RDS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let the AWS Database Migration Service create tables and load data Uses change data capture to keep them in sync
Keep your apps running during the migration
AWS Database Migration Service
CustomerPremises
Application Users
EC2or
RDS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let the AWS Database Migration Service create tables and load data Uses change data capture to keep them in syncSwitch applications over to the target at your convenience
Keep your apps running during the migration
AWS Database Migration Service
AWS Database Migration Service - PostgreSQL
• Source - on premise or EC2 PostgreSQL (9.4+) RDS (9.4.9+ or 9.5.4+ or 9.6.1+)
• Destination can be EC2 or RDS• Initial bulk copy via consistent select• Uses PostgreSQL logical replication support to provide
change data capture
https://aws.amazon.com/dms/
Schema Conversion Tool - SCT
Downloadable tool (Windows, Mac, Linux Desktop)
Source Database Target Database on Amazon RDSMicrosoft SQL Server Amazon Aurora, MySQL, PostgreSQLMySQL PostgreSQLOracle Amazon Aurora, MySQL, PostgreSQLPostgreSQL Amazon Aurora, MySQL
SCT - Analysis
SCT - Detailed
Logical Replication Support• Supported with 9.6.1+, 9.5.4+ and 9.4.9+• Set rds.logical_replication parameter to 1• As user who has rds_replication & rds_superuser role
SELECT * FROM pg_create_logical_replication_slot('test_slot', 'test_decoding');
pg_recvlogical -d postgres --slot test_slot -U master --host $rds_hostname -f - --start
• Added support for Event Triggers
Logical Decoding Space Usage
CloudWatch – Replication Lag
CloudWatch – Slot usage for WAL
Logical Replication Support - Example
RDS Postgres
RDS Postgres
Logical Replica
Redshift
DMS
Logical Replication Support - Example
RDS Postgres
RDS Postgres
Logical Replica
Redshift
On PremisePostgres
DMS
Logical Replication Support - Example
RDS Postgres
RDS Postgres
Logical Replica
RedshiftEC2 Postgres
On PremisePostgres
DMS
Logical Replication Support - Example
RDS Postgres
RDS Postgres
Logical Replica
RedshiftEC2 Postgres
On PremisePostgres
DMS
S3(new)
Logical Replication Support - Example
RDS Postgres
RDS Postgres
Logical Replica
RedshiftEC2 Postgres
On PremisePostgres
DMS
EC2 Oracle
S3(new)
Logical Replication Support - Example
RDS Postgres
RDS Postgres
Logical Replica
RedshiftEC2 Postgres
On PremisePostgres
DMS
EC2 Oracle
CustomLogicalHandler
S3(new)NoSQL DB
Lessons
Vacuum parameters
Will auto vacuum when• autovacuum_vacuum_threshold +
autovacuum_vacuum_scale_factor * pgclass.reltuples
How hard auto vacuum works • autovacuum_max_workers• autovacuum_nap_time• autovacuum_cost_limit• autovacuum_cost_delay
Transaction IDWrap Around
RDS autovacuum logging (9.4.5+)
log_autovacuum_min_duration = 5000 (i.e. 5 secs)rds.force_autovacuum_logging_level = LOG
…[14638]:ERROR: canceling autovacuum task…[14638]:CONTEXT: automatic vacuum of table "postgres.public.pgbench_tellers"…[14638]:LOG: skipping vacuum of "pgbench_branches" --- lock not available
RDS autovacuum visibility(9.3.12, 9.4.7, 9.5.2)pg_stat_activity
BEFORE usename | query----------+------------------------------------------------------------- rdsadmin | <insufficient privilege> rdsadmin | <insufficient privilege> gtest | SELECT c FROM sbtest27 WHERE id BETWEEN 392582 AND 392582+4 gtest | select usename, query from pg_stat_activity
NOW usename | query----------+---------------------------------------------- rdsadmin | <insufficient privilege> gtest | select usename, query from pg_stat_activity gtest | COMMIT rdsadmin | autovacuum: ANALYZE public.sbtest16
CloudWatch Metric
Scale and availability
M4 Instance Class – pgbench read only
1 2 4 8 160
2000
4000
6000
8000
10000
12000
14000
db.m3.large db.m4.large
Threads
Tran
sact
ions
per
Sec
ond
(TPS
)
46% Better Price/Performance
37% TPS Increase
$0.195 $0.182
Enhanced Operating System (OS) metrics
1-60 second granularity
cpuUtilization• guest• irq• system• wait• idl: • user • total • steal • nice
diskIO • writeKbPS• readIOsPS• await • readKbPS• rrqmPS • util • avgQueueLen • tps • readKb • writeKb • avgReqSz • wrqmPS • writeIOsPS
memory • writeback• cached • free • inactive• dirty • mapped • active • total • slab • buffers• pageTable• Hugepages
swap • cached • total • free
tasks• sleeping • zombie • running • stopped • total • blocked
fileSys • used• usedFiles• usedFilePercent• maxFiles • total • usedPercent
loadAverageMinute • fifteen • five • one
uptime
processList• name• cpuTime• parentID• memoryUsedPct• cpuUsedPct• id • rss• vss
Process List
OS metrics
Performance Insights – In Preview
Performance Insights – In Preview
Aurora PostgreSQL
Read Replicas = Availability
Secondary
Application
Read Replica
Read Replica
SyncReplication
Multi-AZ
Primary
Eventually Consistent Reads
Writes & Consistent
Reads
Async Replication
Read Replicas = Availability
Application
Read Replica
Read ReplicaSecondary Primary
Eventually Consistent Reads
Writes & Consistent
Reads
Async Replication
Read Replicas = Availability
Application
Read Replica
Read ReplicaSecondary Primary
Eventually Consistent Reads
Writes & Consistent
Reads
Async Replication
Read Replicas = Availability
Application
Read Replica
Read ReplicaSecondary Primary
Eventually Consistent Reads
Writes & Consistent
Reads
Async Replication
Upgrade
Read Replicas = Availability
Application
Read Replica
Read ReplicaSecondary Primary
Eventually Consistent Reads
Writes & Consistent
Reads
Async Replication
Modify
DB
Application
AZ-1 AZ-2
Aurora Storage
AZ-3
Aurora
Aurora Storage
Aurora Storage
Aurora Storage
Aurora Storage
Aurora Storage
4/6 sync writes
Aurora Timing Example
Location 1
Location 2
Location 3
Start Finish
Only need 4/6 sync writes
50 90 99.9 99.990
20
40
60
80
100
120
140
6.110.4
21.1
30.8
712
28
123High Concurrency Sync Write Test
2 Node (4 copy) 3 Node (6 Copy)
Percentile
Latn
cy (m
s)
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
Queued Work
Storage
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
Storage
A Queued Work
Storage
B
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
StorageA
Queued Work
StorageB C D E
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
StorageA
Queued Work
StorageB C D E
2 2 1 0 1A B C D E
DurabilityTracking
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
StorageA
Queued Work
StorageB C D E
4 3 4 2 4A B C D E
DurabilityTracking
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
StorageA
Queued Work
StorageB C D E
4 3 4 2 4A B C D E
DurabilityTracking
Concurrency
Queued Work
Log Buffer
PostgreSQL Aurora
StorageA
Queued Work
StorageB C D E
6 5 6 3 5A B C D E
DurabilityTracking
AZ-1 AZ-2 AZ-3
Aurora Storage and Replicas
RW
Application
AZ-1 AZ-2 AZ-3
Aurora Storage and Replicas
RW
Application
RO
Application
AZ-1 AZ-2 AZ-3
Aurora Storage and Replicas
RW
Application
RO
Application
AsyncInvalidation& Update
AsyncInvalidation& Update
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6;
WAL
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6;
Full Block
WAL
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6;
Full Block
WAL
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6;
Full Block
WAL
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6;
Checkpoint
Datafile
Full Block
WAL
Archive
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full Block
WAL
Archive
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full Block
WAL
Archive
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full Block
WAL
Archive
Block in Memory
AuroraStorage
Aurora – Writing Less
Block in Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full Block
WAL
Archive
Block in Memory
AuroraStorage
Amazon Aurora Loads Data 3x Faster Database initialization is three times faster than PostgreSQL using the standard PgBench benchmark
Command: pgbench -i -s 2000 –F 90
Amazon Aurora Delivers up to 85x Faster Recovery
SysBench oltp(write-only) 10GiB workload with 250 tables & 150,000 rows
PostgreSQL12.5GB
Checkpoint
PostgreSQL8.3GB Checkpoint
PostgreSQL2.1GB Checkpoint
Amazon AuroraNo Checkpoints
0 20,000 40,000 60,000 80,000
0 20 40 60 80 100 120 140
Writes per Second 69,620
Writes per Second 32,765
Writes per Second 16,075
Writes per Second 92,415
Recovery Time (seconds) 102.0
Recovery Time (seconds) 52.0
Recovery Time (seconds) 13.0
Recovery Time (seconds) 1.2
Crash Recovery Time - SysBench 10GB Write Workload
Writes Per Second
Recovery Time in Seconds
Transaction-aware storage system recovers almost instantly
Amazon Aurora is >=2x Faster on PgBench
pgbench “tpcb-like” workload, scale 2000 (30GiB). All configurations run for 60 minutes
Amazon Aurora is 2x-3x Faster on SysBenchAmazon Aurora delivers 2x the absolute peak of PostgreSQL and 3x PostgreSQL performance at high client counts
SysBench oltp(write-only) workload with 30 GB database with 250 tables and 400,000 initial rows per table
Amazon Aurora Gives >2x Faster Response TimesResponse time under heavy write load >2x faster than PostgreSQL(and >10x more consistent)
SysBench oltp(write-only) 23GiB workload with 250 tables and 300,000 initial rows per table. 10-minute warmup.
Amazon Aurora Has More Consistent ThroughputWhile running at load, performance is more than three timesmore consistent than PostgreSQL
PgBench “tpcb-like” workload at scale 2000. Amazon Aurora was run with 1280 clients. PostgreSQL was run with 512 clients (the concurrency at which it delivered the best overall throughput)
Amazon Aurora is 3x Faster at Large Scale Scales from 1.5x to 3x faster as database grows from 10 GiB to 100 GiB
SysBench oltp(write-only) – 10GiB with 250 tables & 150,000 rows and 100GiB with 250 tables & 1,500,000 rows
10GB 100GB0
20,000
40,000
60,000
80,000
100,000
120,000
75,666
27,491
112,390
82,714
SysBench write-only
PostgreSQL Amazon Aurora
SysBench Test Size
writ
es /
sec
Thank you!
Questions?