amazon rds for postgresql: what's new and lessons learned - ny 2017

Post on 21-Apr-2017

186 Views

Category:

Data & Analytics

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Grant McAlister – Senior Principal Engineer - RDS

March 2017

Amazon RDS for PostgreSQLWhat’s New and Lessons Learned

Amazon Aurora with PostgreSQL Compatibility

• PostgreSQL 9.6+• Cloud Optimized• Log based• 6 Copies across 3 Availability Zones• Up to 15 Read Replicas • Faster Failover• Enhanced Scaling• Autoscaling of storage to 64TB

Logging + Storage

SQL

Transactions

Caching

Amazon S3

PREVIEW

McAlister, Grant

RDS Version Updates

New Major Version – 9.6

New Minor Releases (soon)• 9.6.2• 9.5.6 • 9.4.11 • 9.3.16

Extension Support Additions

9.6.1 bloom & pg_visibility9.6.2 log_fdw, pg_hint_plan & pg_freespacemap

rds-postgres-extensions-request@amazon.com

9.3 Original - 32

9.3 Current - 35

9.4 Current - 399.5 Current - 44

Future - ???9.6 Current - 49

log_fdw set log_destination to csvlog

postgres=> create extension log_fdw;

postgres=> CREATE SERVER log_fdw_server FOREIGN DATA WRAPPER log_fdw;

postgres=> select * from list_postgres_log_files(); file_name | file_size_bytes----------------------------------+----------------- postgresql.log.2017-03-28-17.csv | 2068 postgres.log | 617

postgres=> select create_foreign_table_for_log_file('pg_csv_log','log_fdw_server','postgresql.log.2017-03-28-17.csv');

postgres=> select log_time, message from pg_csv_log where message like 'connection%'; log_time | message----------------------------+-------------------------------------------------------------------------------- 2017-03-28 17:50:01.862+00 | connection received: host=ec2-54-174-205.compute-1.amazonaws.com port=45626 2017-03-28 17:50:01.868+00 | connection authorized: user=mike database=postgres

log_fdw - continued

can be done without csvpostgres=> select create_foreign_table_for_log_file('pg_log','log_fdw_server','postgresql.log.2017-03-28-17');

postgres=> select log_entry from pg_log where log_entry like '%connection%';

log_entry----------------------------------------------------------------------------------------------------------------------------------------------------2017-03-28 17:50:01 UTC:ec2-54-174.compute-1.amazonaws.com(45626):[unknown]@[unknown]:[20434]:LOG: received: host=ec2-54-174-205..amazonaws.com 2017-03-28 17:50:01 UTC:ec2-54-174.compute-1.amazonaws.com(45626):mike@postgres:[20434]:LOG: connection authorized: user=mike database=postgres2017-03-28 17:57:44 UTC:ec2-54-174.compute-1.amazonaws.com(45626):mike@postgres:[20434]:ERROR: column "connection" does not exist at character 143

pg_hint_plan

Add to shared_preload_libraries

• pg_hint_plan.debug_print• pg_hint_plan.enable_hint• pg_hint_plan.enable_hint_table• pg_hint_plan.message_level• pg_hint_plan.parse_messages

pg_hint_plan - examplepostgres=> EXPLAIN SELECT * FROM pgbench_branches bpostgres-> JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid; QUERY PLAN------------------------------------------------------------------------------------------- Sort (cost=15943073.17..15993073.17 rows=20000000 width=465) Sort Key: a.aid -> Hash Join (cost=5.50..802874.50 rows=20000000 width=465) Hash Cond: (a.bid = b.bid) -> Seq Scan on pgbench_accounts a (cost=0.00..527869.00 rows=20000000 width=97) -> Hash (cost=3.00..3.00 rows=200 width=364) -> Seq Scan on pgbench_branches b (cost=0.00..3.00 rows=200 width=364)

postgres=> /*+ NestLoop(a b) */postgres-> EXPLAIN SELECT * FROM pgbench_branches bpostgres-> JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid; QUERY PLAN------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.58..44297240.44 rows=20000000 width=465) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts a (cost=0.44..847232.44 rows=20000000 width=97) -> Index Scan using pgbench_branches_pkey on pgbench_branches b (cost=0.14..2.16 rows=1 width=364) Index Cond: (bid = a.bid)

Major version upgrade

Prod9.5

Prod9.6

pg_upgrade

Backup Backup

No PITR

Test9.5

Test9.6

pg_upgrade

Restore to a test instance

Application Testing

Security

Forcing SSL on all connections

DB Instance

SnapshotApplication

HostSSL

Log Backups

Security Group

Forcing SSL on all connections

DB Instance

SnapshotApplication

HostSSL

Log Backups

Security Group

VPC

Forcing SSL on all connections

DB Instance

SnapshotApplication

HostSSL

Log Backups

Security Group

VPC

Encryption at Rest

Forcing SSL on all connections

DB Instance

SnapshotApplication

HostSSL

Log Backups

Security Group

VPC

Encryption at Rest

ssl_mode=disable

Forcing SSL on all connections

DB Instance

SnapshotApplication

HostSSL

Log Backups

Security Group

VPC

Encryption at Rest

ssl_mode=disable

rds.force_ssl=1 (default 0)

Unencrypted Snapshot Sharing

DB Instance

Snapshot

Prod Account

Test Account

Snapshot

Share with account

Unencrypted Snapshot Sharing

DB Instance

Snapshot

Prod Account

Test Account

Snapshot

Snapshot

Share with account

Unencrypted Snapshot Sharing

DB Instance

Snapshot

Prod Account

Test Account

SnapshotDB Instance

Snapshot

Share with account

Share to Public

Encrypted Snapshot Sharing

DB Instance

Snapshot

Prod Account

Test Account

Snapshot

Share with account

Encryption at Rest

Default

Encrypted Snapshot Sharing

DB Instance

Snapshot

Prod Account

Test Account

Snapshot

Share with account

Encryption at Rest

CustomKey

Add external account

Encrypted Snapshot Sharing

DB Instance

Snapshot

Prod Account

Test Account

SnapshotDB Instance

Snapshot

Share with account

Encryption at Rest

CustomKey

Add external account

Cross Region Replicas – Encrypted

SecondarySync

Application

Primary

AZ1 AZ2

Application

US-EAST-1

Cross Region Replicas – Encrypted

SecondarySync

Application

Primary

AZ1 AZ2

Application

Read Replica

Application

AZ1Async Replication

US-EAST-1 EU-WEST-1

HIPAA-eligible service & FedRAMP

• RDS PostgreSQL is now a HIPAA-eligible service• https://aws.amazon.com/compliance/hipaa-compliance/

• FedRAMP in AWS GovCloud (US) region • https://aws.amazon.com/compliance/fedramp/

Data movement

Move data to the same or different database engine Keep your apps running during the migrationStart your first migration in 10 minutes or lessReplicate within, to, or from AWS EC2 or RDS

AWSDatabase Migration

Service(DMS)

CustomerPremises

Application Users

EC2or

RDS

Internet

VPN

Start a replication instance

Keep your apps running during the migration

AWS Database Migration Service

CustomerPremises

Application Users

EC2or

RDS

Internet

VPN

Start a replication instance

Connect to source and target databases

Select tables, schemas, or databases

Keep your apps running during the migration

AWS Database Migration Service

CustomerPremises

Application Users

EC2or

RDS

Internet

VPN

Start a replication instance

Connect to source and target databases

Select tables, schemas, or databases

Let the AWS Database Migration Service create tables and load data

Keep your apps running during the migration

AWS Database Migration Service

CustomerPremises

Application Users

EC2or

RDS

Internet

VPN

Start a replication instance

Connect to source and target databases

Select tables, schemas, or databases

Let the AWS Database Migration Service create tables and load data Uses change data capture to keep them in sync

Keep your apps running during the migration

AWS Database Migration Service

CustomerPremises

Application Users

EC2or

RDS

Internet

VPN

Start a replication instance

Connect to source and target databases

Select tables, schemas, or databases

Let the AWS Database Migration Service create tables and load data Uses change data capture to keep them in syncSwitch applications over to the target at your convenience

Keep your apps running during the migration

AWS Database Migration Service

AWS Database Migration Service - PostgreSQL

• Source - on premise or EC2 PostgreSQL (9.4+) RDS (9.4.9+ or 9.5.4+ or 9.6.1+)

• Destination can be EC2 or RDS• Initial bulk copy via consistent select• Uses PostgreSQL logical replication support to provide

change data capture

https://aws.amazon.com/dms/

Schema Conversion Tool - SCT

Downloadable tool (Windows, Mac, Linux Desktop)

Source Database Target Database on Amazon RDSMicrosoft SQL Server Amazon Aurora, MySQL, PostgreSQLMySQL PostgreSQLOracle Amazon Aurora, MySQL, PostgreSQLPostgreSQL Amazon Aurora, MySQL

SCT - Analysis

SCT - Detailed

Logical Replication Support• Supported with 9.6.1+, 9.5.4+ and 9.4.9+• Set rds.logical_replication parameter to 1• As user who has rds_replication & rds_superuser role

SELECT * FROM pg_create_logical_replication_slot('test_slot', 'test_decoding');

pg_recvlogical -d postgres --slot test_slot -U master --host $rds_hostname -f - --start

• Added support for Event Triggers

Logical Decoding Space Usage

CloudWatch – Replication Lag

CloudWatch – Slot usage for WAL

Logical Replication Support - Example

RDS Postgres

RDS Postgres

Logical Replica

Redshift

DMS

Logical Replication Support - Example

RDS Postgres

RDS Postgres

Logical Replica

Redshift

On PremisePostgres

DMS

Logical Replication Support - Example

RDS Postgres

RDS Postgres

Logical Replica

RedshiftEC2 Postgres

On PremisePostgres

DMS

Logical Replication Support - Example

RDS Postgres

RDS Postgres

Logical Replica

RedshiftEC2 Postgres

On PremisePostgres

DMS

S3(new)

Logical Replication Support - Example

RDS Postgres

RDS Postgres

Logical Replica

RedshiftEC2 Postgres

On PremisePostgres

DMS

EC2 Oracle

S3(new)

Logical Replication Support - Example

RDS Postgres

RDS Postgres

Logical Replica

RedshiftEC2 Postgres

On PremisePostgres

DMS

EC2 Oracle

CustomLogicalHandler

S3(new)NoSQL DB

Lessons

Vacuum parameters

Will auto vacuum when• autovacuum_vacuum_threshold +

autovacuum_vacuum_scale_factor * pgclass.reltuples

How hard auto vacuum works • autovacuum_max_workers• autovacuum_nap_time• autovacuum_cost_limit• autovacuum_cost_delay

Transaction IDWrap Around

RDS autovacuum logging (9.4.5+)

log_autovacuum_min_duration = 5000 (i.e. 5 secs)rds.force_autovacuum_logging_level = LOG

…[14638]:ERROR:  canceling autovacuum task…[14638]:CONTEXT:  automatic vacuum of table "postgres.public.pgbench_tellers"…[14638]:LOG:  skipping vacuum of "pgbench_branches" --- lock not available

RDS autovacuum visibility(9.3.12, 9.4.7, 9.5.2)pg_stat_activity

BEFORE usename | query----------+------------------------------------------------------------- rdsadmin | <insufficient privilege> rdsadmin | <insufficient privilege> gtest | SELECT c FROM sbtest27 WHERE id BETWEEN 392582 AND 392582+4 gtest | select usename, query from pg_stat_activity

NOW usename | query----------+---------------------------------------------- rdsadmin | <insufficient privilege> gtest | select usename, query from pg_stat_activity gtest | COMMIT rdsadmin | autovacuum: ANALYZE public.sbtest16

CloudWatch Metric

Scale and availability

M4 Instance Class – pgbench read only

1 2 4 8 160

2000

4000

6000

8000

10000

12000

14000

db.m3.large db.m4.large

Threads

Tran

sact

ions

per

Sec

ond

(TPS

)

46% Better Price/Performance

37% TPS Increase

$0.195 $0.182

Enhanced Operating System (OS) metrics

1-60 second granularity

cpuUtilization• guest• irq• system• wait• idl: • user • total • steal • nice

diskIO • writeKbPS• readIOsPS• await • readKbPS• rrqmPS • util • avgQueueLen • tps • readKb • writeKb • avgReqSz • wrqmPS • writeIOsPS

memory • writeback• cached • free • inactive• dirty • mapped • active • total • slab • buffers• pageTable• Hugepages

swap • cached • total • free

tasks• sleeping • zombie • running • stopped • total • blocked

fileSys • used• usedFiles• usedFilePercent• maxFiles • total • usedPercent

loadAverageMinute • fifteen • five • one

uptime

processList• name• cpuTime• parentID• memoryUsedPct• cpuUsedPct• id • rss• vss

Process List

OS metrics

Performance Insights – In Preview

Performance Insights – In Preview

Aurora PostgreSQL

Read Replicas = Availability

Secondary

Application

Read Replica

Read Replica

SyncReplication

Multi-AZ

Primary

Eventually Consistent Reads

Writes & Consistent

Reads

Async Replication

Read Replicas = Availability

Application

Read Replica

Read ReplicaSecondary Primary

Eventually Consistent Reads

Writes & Consistent

Reads

Async Replication

Read Replicas = Availability

Application

Read Replica

Read ReplicaSecondary Primary

Eventually Consistent Reads

Writes & Consistent

Reads

Async Replication

Read Replicas = Availability

Application

Read Replica

Read ReplicaSecondary Primary

Eventually Consistent Reads

Writes & Consistent

Reads

Async Replication

Upgrade

Read Replicas = Availability

Application

Read Replica

Read ReplicaSecondary Primary

Eventually Consistent Reads

Writes & Consistent

Reads

Async Replication

Modify

DB

Application

AZ-1 AZ-2

Aurora Storage

AZ-3

Aurora

Aurora Storage

Aurora Storage

Aurora Storage

Aurora Storage

Aurora Storage

4/6 sync writes

Aurora Timing Example

Location 1

Location 2

Location 3

Start Finish

Only need 4/6 sync writes

50 90 99.9 99.990

20

40

60

80

100

120

140

6.110.4

21.1

30.8

712

28

123High Concurrency Sync Write Test

2 Node (4 copy) 3 Node (6 Copy)

Percentile

Latn

cy (m

s)

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

Storage

Queued Work

Storage

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

Storage

Queued Work

Storage

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

Storage

Queued Work

Storage

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

Storage

Queued Work

Storage

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

Storage

Queued Work

Storage

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

Storage

Queued Work

Storage

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

Storage

A Queued Work

Storage

B

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

StorageA

Queued Work

StorageB C D E

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

StorageA

Queued Work

StorageB C D E

2 2 1 0 1A B C D E

DurabilityTracking

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

StorageA

Queued Work

StorageB C D E

4 3 4 2 4A B C D E

DurabilityTracking

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

StorageA

Queued Work

StorageB C D E

4 3 4 2 4A B C D E

DurabilityTracking

Concurrency

Queued Work

Log Buffer

PostgreSQL Aurora

StorageA

Queued Work

StorageB C D E

6 5 6 3 5A B C D E

DurabilityTracking

AZ-1 AZ-2 AZ-3

Aurora Storage and Replicas

RW

Application

AZ-1 AZ-2 AZ-3

Aurora Storage and Replicas

RW

Application

RO

Application

AZ-1 AZ-2 AZ-3

Aurora Storage and Replicas

RW

Application

RO

Application

AsyncInvalidation& Update

AsyncInvalidation& Update

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6;

WAL

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6;

Full Block

WAL

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6;

Full Block

WAL

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6;

Full Block

WAL

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6;

Checkpoint

Datafile

Full Block

WAL

Archive

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6; update t set y = 6;

Checkpoint

Datafile

Full Block

WAL

Archive

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6; update t set y = 6;

Checkpoint

Datafile

Full Block

WAL

Archive

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6; update t set y = 6;

Checkpoint

Datafile

Full Block

WAL

Archive

Block in Memory

AuroraStorage

Aurora – Writing Less

Block in Memory

PostgreSQL Aurora

update t set y = 6; update t set y = 6;

Checkpoint

Datafile

Full Block

WAL

Archive

Block in Memory

AuroraStorage

Amazon Aurora Loads Data 3x Faster Database initialization is three times faster than PostgreSQL using the standard PgBench benchmark

Command: pgbench -i -s 2000 –F 90

Amazon Aurora Delivers up to 85x Faster Recovery

SysBench oltp(write-only) 10GiB workload with 250 tables & 150,000 rows

PostgreSQL12.5GB

Checkpoint

PostgreSQL8.3GB Checkpoint

PostgreSQL2.1GB Checkpoint

Amazon AuroraNo Checkpoints

0 20,000 40,000 60,000 80,000

0 20 40 60 80 100 120 140

Writes per Second 69,620

Writes per Second 32,765

Writes per Second 16,075

Writes per Second 92,415

Recovery Time (seconds) 102.0

Recovery Time (seconds) 52.0

Recovery Time (seconds) 13.0

Recovery Time (seconds) 1.2

Crash Recovery Time - SysBench 10GB Write Workload

Writes Per Second

Recovery Time in Seconds

Transaction-aware storage system recovers almost instantly

Amazon Aurora is >=2x Faster on PgBench

pgbench “tpcb-like” workload, scale 2000 (30GiB). All configurations run for 60 minutes

Amazon Aurora is 2x-3x Faster on SysBenchAmazon Aurora delivers 2x the absolute peak of PostgreSQL and 3x PostgreSQL performance at high client counts

SysBench oltp(write-only) workload with 30 GB database with 250 tables and 400,000 initial rows per table

Amazon Aurora Gives >2x Faster Response TimesResponse time under heavy write load >2x faster than PostgreSQL(and >10x more consistent)

SysBench oltp(write-only) 23GiB workload with 250 tables and 300,000 initial rows per table. 10-minute warmup.

Amazon Aurora Has More Consistent ThroughputWhile running at load, performance is more than three timesmore consistent than PostgreSQL

PgBench “tpcb-like” workload at scale 2000. Amazon Aurora was run with 1280 clients. PostgreSQL was run with 512 clients (the concurrency at which it delivered the best overall throughput)

Amazon Aurora is 3x Faster at Large Scale Scales from 1.5x to 3x faster as database grows from 10 GiB to 100 GiB

SysBench oltp(write-only) – 10GiB with 250 tables & 150,000 rows and 100GiB with 250 tables & 1,500,000 rows

10GB 100GB0

20,000

40,000

60,000

80,000

100,000

120,000

75,666

27,491

112,390

82,714

SysBench write-only

PostgreSQL Amazon Aurora

SysBench Test Size

writ

es /

sec

Thank you!

Questions?

top related