linux io internals for database administrators (scale 2017 and pgday nordic 2017 version)

Post on 21-Apr-2017

125 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linux IO internalsfor database administrators

Ilya Kosmodemianskyik@dataegret.com

1 Why this talk

• Linux is a most common OS for databases• DBAs often run into IO problems• Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

2 The main IO problem for databases

• How to maximize page throughput between memory anddisks

• Things involved:I DisksI MemoryI CPUI IO SchedulersI FilesystemsI Database itself

• IO problems for databases are not always only about disks

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

3 A typical database

DRAM

Disks

Shared memory

Database

Linux

Page cache

User space

Kernel space

WAL bu�er

WAL Data�le

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

4 Key things about such workload

• Shared memory segment can be very large• Keeping in-memory pages synchronized with disk generateshuge IO

• WAL should be written fast and safe• One and every layer of OS IO stack involved

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

5 Memory allocation and mapping

CPU L1

MMU TLB

Memory

L2 L3

pagetable

Virtual addressing

Translation

Physical addressing

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

6 DBAs takeaways:

• Database with huge shared memory segment benefits fromhuge pages

• But not from transparent huge pagesI Databases operate large continuous shared memory segmentsI THP defragmentation can lead to severe performance

degradation in such cases

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

7 Freeing memory

Page cache

Swap

Free list Free list Free list

alloc()free()

page-out

vm.min_free_kbytes

reclaim

vm.swapiness

OOM-killer

vm.panic_on_oom

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

8 Page-out

• Page-out happens if:I someone calls fsyncI 30 sec timeout exceeded (vm.dirty_expire_centisecs)I Too many dirty pages (vm.dirty_background_ratio and

vm.dirty_ratio)

• It is reasonable to tune vm.dirty_* on rotating disks withRAID controller, but helps a little on server class SSDs

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

9 DBAs takeaways:

• vm.overcommit_memoryI 0 - heuristic overcommit, reduces swap usageI 1 - always overcommitI 2 - do not overcommit (vm.overcommit_ratio = 50 by

default)

• vm.min_free_kbytes - reasonably high (can be easy 1000000on a server with enough memory)

• vm.swappiness = 1I 0 - swap disabledI 60 - defaultI 100 - swap preffered instead of other reaping mechanisms

• Your database would not like OOM-killer

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

10 OOM-killer: plague and cholera

• vm.panic_on_oom effectively disables OOM-killer, but that isprobably not the result you desire

• Or for a certain process: echo -17 > /proc/12465/oom_adjbut again

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

11 Filesystems: write barriers

Journal

Kernel bu�er

Journalated fylesystem

Data

Journal entryData

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

12 DBAs takeaways:

• ext4 or xfs• Disable write barrier (only if SSD/controller cache is protectedby capacitor/battery)

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

13 IO stack

Database memory

Page cacheVFSEXT4

Block device interfaceDisks

Direct IO

BIO Layer

Block IO

Request Layer Elevator/IO Scheduler

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

14 Elevators: before 2.6 kernel

• Linus Elevator - the only one in times of 2.4• merging and sorting request queues• Had lots of problems• The Linux Scheduler: a Decade of Wasted Cores – Lozi et al.2016

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

15 Elevators: between 2.6 and early 3.*

• CFQ - universal, default one• deadline - rotating disks• noop or none - then disks throughput is so high, that it cannot benefit from keen scheduling

I PCIe SSDsI SAN disk arrays

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

16 Elevators: 3.13 and newer

• Effectiveness of noop clearly shows ineffectiveness of others,or ineffectiveness of smart sorting as an approach

• blk-mq scheduler was merged into 3.13 kernel• Much better deals with parallelism of modern SSD - basicallyseparate IO queue for each CPU

• The best option for good SSDs right now

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

17 Thanks

to my collegues Alexey Lesovsky and Max Boguk for a lot ofresearch on this topic

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

18 Questions?

ik@dataegret.com

dataegret.com

Why this talk

• Linux is a most common OS for databases

• DBAs often run into IO problems

• Most of the information on topic is written by kerneldevelopers(for kernel developers) or is checklist-style

• Checklists are useful, but up to certain workload

dataegret.com

top related