linux tuning for postgresql at secon 2015

32
Оптимизация Linux для PostgreSQL Алексей Лесовский PostgreSQL-Consulting.com Пенза, 24 апреля 2015 2015.secon.ru

Upload: -

Post on 16-Jul-2015

948 views

Category:

Engineering


8 download

TRANSCRIPT

Page 1: Linux tuning for PostgreSQL at Secon 2015

Оптимизация LinuxдляPostgreSQL

Алексей ЛесовскийPostgreSQL-Consulting.com

Пенза, 24 апреля 20152015.secon.ru

Page 2: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

About:● Алексей Лесовский

● PostgreSQL DBA

● Linux system administrator

PostgreSQL-Consulting.com● Поддержка 24/7

● Аудит и оптимизация производительности

● Консультации и тренинги

● Мониторинг и аварийное реагирование

● Планирование ресурсов

Page 3: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Agenda

● Linux on PostgreSQL ?

● Resources and Databases

● OS subsystems

● CPU & Process scheduling, Power saving

● Memory & VM, NUMA, Huge Pages

● Storage & Filesystems, Input/Output

● Other misc

Page 4: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Intro.

● Linux as a platform for PostgreSQL database

● Active development & Community support

● Rich features & Fast implementation

● Stable & Mature & Durable

● Default settings in Linux

Page 5: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning targets

CPU

Disk

Memory

ConcurrencyQuery Speed

Sort, Group, Hash,...

OS cache (page cache)DB cache (shared_buffers)Operations (work_mem)

DB storageTransactional Log (WAL)

Cold start

Page 6: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning targets

CPU

Disk

Memory

CPU SchedulerNUMA

Power saving

Virtual MemoryNUMAHuge Pages

FileSystemsStorage I/O

Page 7: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning targets:

● CPU scheduler

● Virtual memory and NUMA

● Huge pages

● File systems

● Storage IO

● Power saving policy

● Others

Page 8: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: CPU scheduler

● CPU scheduler is ...

● sysctl: kernel.sched_migration_cost_ns

● sysctl: kernel.sched_autogroup_enabled

● http://www.postgresql.org/message-id/[email protected]

● http://kernelnewbies.org/Linux_2_6_38#head-59575a6aeafa38490226a560ee02de89829a5b20

Page 9: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: CPU scheduler

● WARNING: should use update kernel because

● Ubuntu 12.04 Bug #1055222,

● Ubuntu 14.04 Bug #1422016

A kernel panic occurs while disabling the automatic task group creation within /etc/sysctl.conf using "kernel.sched_autogroup_enabled = 0". After adding the "noautogroup" kernel parameter to GRUB, the system will boot without issue.

Page 10: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: CPU scheduler

● pgbench tests: $ pgbench -S -c 8 -T 30 -U postgres pgbench transaction type: SELECT only scaling factor: 30 duration: 30 s number of clients: 8 number of threads: 1

sched_migration_cost_ns = 50000, sched_autogroup_enabled = 1

- tps: 22621, 22692, 22502

sched_migration_cost_ns = 500000, sched_autogroup_enabled = 0

- tps: 23689, 23930, 23657

Page 11: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Virtual Memory

● Virtual Memory

● Allocator, Caching, Dirty pages and Writeback

● vm.dirty_background_ratio & vm.dirty_ratio

● vm.dirty_background_bytes & vm.dirty_bytes

Page 12: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Virtual Memory

0% of pagecache RAM dirty

do nothing

dirty_backgrouond_ratio (10% of RAM dirty) – wakeup flushd

dirty_ratio (20% of RAM dirty) – process start sync writes

100% of pagecache RAM dirty

flushd writes dirty buffers in background

flushd and write()'ng process write dirty buffer

Page 13: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Virtual Memory

● vm.swappiness

● Out-of-memory & OOM-Killer

Page 14: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: NUMA

c1 c2c3 c4

c1 c2c3 c4

c1 c2c3 c4

c1 c2c3 c4

M1

M3 M4

M2

S1 S2

S4S3

S – SocketC – CPU CoreM – Memory Bank

Page 15: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: NUMA

● BIOS: enable memory node interleaving

● kernel boot: numa=of

● numactl utility

● sysctl: vm.zone_reclaim_mode = 0

● sysctl: kernel.numa_balancing = 0

Page 16: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: NUMA

$ numactl --hardware available: 1 nodes (0-0) -- NUMA disabled node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 0 size: 196578 MB node 0 free: 1122 MB node distances: node 0 0: 10 $ numactl --hardware available: 2 nodes (0-1) -- NUMA enabled node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17 node 0 size: 65501 MB node 0 free: 4186 MB node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23 node 1 size: 65536 MB node 1 free: 10803 MB node distances: node 0 1 0: 10 21 1: 21 10

Page 17: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Huge Pages

● Huge pages vs. Transparent huge pages

● Transparent huge pages

● echo never > /sys/kernel/mm/transparent_hugepage/enabled

● echo never > /sys/kernel/mm/transparent_hugepage/defrag

Page 18: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

PostgreSQL with Huge Pages

● libhugetlbfs

● sysctl: vm.hugetlb_shm_group

● sysctl: vm.nr_hugepages

● sysctl: vm.nr_overcommit_hugepages

● environment var: HUGETLB_SHM & LD_PRELOAD

● postgresql.conf: huge_pages = try | on

Page 19: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

PostgreSQL with Huge Pages

● $ grep ^Huge /proc/meminfo HugePages_Total: 75532 HugePages_Free: 54 HugePages_Rsvd: 13 HugePages_Surp: 0 Hugepagesize: 2048 kB

● $ pmap -x $(pgrep postgres|head -n 1) Address Kbytes RSS Dirty Mode Mapping 00007f24a1e00000 25802752 0 0 rw-s- anon_hugepage (deleted)

Page 20: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Huge Pages: Resume

● Disable Transparent Huge Pages

● PostgreSQL <= 9.2: libhugetlbfs

● PostgreSQL 9.3: huge pages not supported

● PostgreSQL >= 9.4: huge pages native support

Page 21: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Filesystems

● Ext3 vs Ext4 vs XFS

● Filesystem Barriers

● Disable Write Cache● hdparm -W0 /device/● MegaCli64 -LDSetProp -DisDskCache -Lall -aALL

● Hardware RAID + BBU = barrier=0

● Software RAID = barrier=1

● Enterprise SSD with Power Loss Protection = barrier=0

● Consumer SSD w/o Power Loss Protection = barrier=1

● http://www.thessdreview.com/

Page 22: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Filesystems barriers

● $ pgbench -s 30 -c 8 -T 30 -U postgres pgbench transaction type: TPC-B (sort of) scaling factor: 30 query mode: simple number of clients: 8 number of threads: 1 duration: 30 s

barrier=1 barrier=0 tps = 1005 tps = 1142 tps = 988 tps = 1164 tps = 988 tps = 1129

Page 23: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Storage IO

● SATA/SAS vs SSD

● IO elevators ● noop: SSD, hi-end storage● deadline: RAID, SATA/SAS● cfq: good default

● # echo 'elevator_name' > /sys/block/<device>/queue/scheduler

● kernel boot: elevator=<name>

● /sys/block/*/queue/: rotational, rq_affinity, read_ahead_kb

Page 24: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Power saving policy

● acpi_cpufreq vs intel_pstate

● scaling_governor● /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors● /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

● acpi_cpufreq + performance

● intel_pstate + powersave

Page 25: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Power saving policy: acpi_cpufreq + ondemand

$ select o.payment_method_id,pc.name,count(o.payment_method_id) as cnt from operation o LEFT JOIN payment_method pm ON pm.id=o.payment_method_id LEFT JOIN payment_classify pc ON pc.id = pm.payment_classify_id WHERE (type='deposit')

AND (DATE(o.created)>='2015-04-05' AND DATE(o.created)<='2015-04-05') GROUP BY o.payment_method_id,pc.name QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=44.24..44.25 rows=1 width=26) (actual time=238.506..238.512 rows=20 loops=1) -> Nested Loop Left Join (cost=0.01..44.23 rows=1 width=26) (actual time=0.068..232.756 rows=10660 loops=1) -> Nested Loop Left Join (cost=0.01..43.64 rows=1 width=16) (actual time=0.058..206.598 rows=10660 loops=1) Join Filter: (pm.id = o.payment_method_id) Rows Removed by Join Filter: 394420 -> Index Scan using operation_date_created_type_state_key on operation o (cost=0.01..41.78 rows=1 width=8) (actual time=0.042..109.343 rows=10660 loops=1) Index Cond: ((date(created) >= '2015-04-05'::date) AND (date(created) <= '2015-04-05'::date) AND (type = 'deposit'::op_type)) -> Seq Scan on payment_method pm (cost=0.00..1.38 rows=38 width=12) (actual time=0.001..0.004 rows=38 loops=10660) -> Index Scan using pc_pkey on payment_classify pc (cost=0.00..0.58 rows=1 width=22) (actual time=0.002..0.002 rows=1 loops=10660) Index Cond: (id = pm.payment_classify_id)

Total runtime: 238.630 ms

Page 26: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Power saving policy: acpi_cpufreq + performance

$ select o.payment_method_id,pc.name,count(o.payment_method_id) as cnt from operation o LEFT JOIN payment_method pm ON pm.id=o.payment_method_id LEFT JOIN payment_classify pc ON pc.id = pm.payment_classify_id WHERE (type='deposit')

AND (DATE(o.created)>='2015-04-05' AND DATE(o.created)<='2015-04-05') GROUP BY o.payment_method_id,pc.name QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=44.24..44.25 rows=1 width=26) (actual time=193.589..193.593 rows=20 loops=1) -> Nested Loop Left Join (cost=0.01..44.23 rows=1 width=26) (actual time=0.034..188.948 rows=10660 loops=1) -> Nested Loop Left Join (cost=0.01..43.64 rows=1 width=16) (actual time=0.030..166.176 rows=10660 loops=1) Join Filter: (pm.id = o.payment_method_id) Rows Removed by Join Filter: 394420 -> Index Scan using operation_date_created_type_state_key on operation o (cost=0.01..41.78 rows=1 width=8) (actual time=0.022..87.541 rows=10660 loops=1) Index Cond: ((date(created) >= '2015-04-05'::date) AND (date(created) <= '2015-04-05'::date) AND (type = 'deposit'::op_type)) -> Seq Scan on payment_method pm (cost=0.00..1.38 rows=38 width=12) (actual time=0.001..0.003 rows=38 loops=10660) -> Index Scan using pc_pkey on payment_classify pc (cost=0.00..0.58 rows=1 width=22) (actual time=0.002..0.002 rows=1 loops=10660) Index Cond: (id = pm.payment_classify_id)

Total runtime: 193.659 ms

Page 27: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Tuning target: Misc

● Clocksource

● acpi_pm vs. hpet vs. tsc

● /sys/devices/system/clocksource/clocksource0/available_clocksource

● /sys/devices/system/clocksource/clocksource0/current_clocksource

Page 28: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Clocksource: TSC

$ select o.payment_method_id,pc.name,count(o.payment_method_id) as cnt from operation o LEFT JOIN payment_method pm ON pm.id=o.payment_method_id LEFT JOIN payment_classify pc ON pc.id = pm.payment_classify_id WHERE (type='deposit')

AND (DATE(o.created)>='2015-04-05' AND DATE(o.created)<='2015-04-05') GROUP BY o.payment_method_id,pc.name QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=44.24..44.25 rows=1 width=26) (actual time=244.300..244.308 rows=20 loops=1) -> Nested Loop Left Join (cost=0.01..44.23 rows=1 width=26) (actual time=0.075..238.265 rows=10660 loops=1) -> Nested Loop Left Join (cost=0.01..43.64 rows=1 width=16) (actual time=0.064..211.236 rows=10660 loops=1) Join Filter: (pm.id = o.payment_method_id) Rows Removed by Join Filter: 394420 -> Index Scan using operation_date_created_type_state_key on operation o (cost=0.01..41.79 rows=1 width=8) (actual time=0.046..111.409 rows=10660 loops=1) Index Cond: ((date(created) >= '2015-04-05'::date) AND (date(created) <= '2015-04-05'::date) AND (type = 'deposit'::op_type)) -> Seq Scan on payment_method pm (cost=0.00..1.38 rows=38 width=12) (actual time=0.001..0.004 rows=38 loops=10660) -> Index Scan using pc_pkey on payment_classify pc (cost=0.00..0.58 rows=1 width=22) (actual time=0.002..0.002 rows=1 loops=10660) Index Cond: (id = pm.payment_classify_id)

Total runtime: 244.433 ms

Page 29: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Clocksource: ACPI_PM

$ select o.payment_method_id,pc.name,count(o.payment_method_id) as cnt from operation o LEFT JOIN payment_method pm ON pm.id=o.payment_method_id LEFT JOIN payment_classify pc ON pc.id = pm.payment_classify_id WHERE (type='deposit')

AND (DATE(o.created)>='2015-04-05' AND DATE(o.created)<='2015-04-05') GROUP BY o.payment_method_id,pc.name QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=44.24..44.25 rows=1 width=26) (actual time=1062.165..1062.188 rows=20 loops=1) -> Nested Loop Left Join (cost=0.01..44.23 rows=1 width=26) (actual time=0.104..1047.211 rows=10660 loops=1) -> Nested Loop Left Join (cost=0.01..43.64 rows=1 width=16) (actual time=0.091..962.252 rows=10660 loops=1) Join Filter: (pm.id = o.payment_method_id) Rows Removed by Join Filter: 394420 -> Index Scan using operation_date_created_type_state_key on operation o (cost=0.01..41.79 rows=1 width=8) (actual time=0.046..104.546 rows=10660 loops=1) Index Cond: ((date(created) >= '2015-04-05'::date) AND (date(created) <= '2015-04-05'::date) AND (type = 'deposit'::op_type)) -> Seq Scan on payment_method pm (cost=0.00..1.38 rows=38 width=12) (actual time=0.002..0.039 rows=38 loops=10660) -> Index Scan using pc_pkey on payment_classify pc (cost=0.00..0.58 rows=1 width=22) (actual time=0.003..0.004 rows=1 loops=10660) Index Cond: (id = pm.payment_classify_id)

Total runtime: 1062.316 ms

Page 30: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Bonus. Tuning target: Pgbouncer

● open files limit● nofile (/etc/security/limits.conf)● prlimit utility (util-linux)

● sysctl: net.ipv4.ip_local_port_range

Page 31: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Questions?

Page 32: Linux tuning for PostgreSQL at Secon 2015

Пенза, 24 апреля 20152015.secon.ru

Thanks!

Alexey Lesovsky

[email protected]