ssd для вашей базы данных, Петр Зайцев (percona)
DESCRIPTION
Доклад Петра Зайцева на HighLoad++ 2014.TRANSCRIPT
![Page 1: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/1.jpg)
Peter Zaitsev,CEO, Percona
November 1, 2014Highload++ 2014
Moscow,Russia
SSD/Flash for Modern Databases
![Page 2: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/2.jpg)
www.percona.com2
Percona
• Percona Server• Percona Xtrabackup• Percona XtraDB Cluster• Percona Toolkit
We love Open Source
Software
• Consulting • Support • Managed Services
We want to help you to
succeed with MySQL and
Beyond
![Page 3: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/3.jpg)
www.percona.com3
In this Presentation
Flash technology overview
Review some of the available technology
What does this mean for databases ?
Specific opportunities for MySQL
![Page 4: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/4.jpg)
www.percona.com4
Before SSDs
![Page 5: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/5.jpg)
www.percona.com5
There were HDDs
Good at Sequential Read/Writes
RT=Seek Time + Rotation Latency
Reads/Write – Similar Latency
No Specific Write Limits
Retain data for a long time
One IO Request in Parallel
Low cost per GB
![Page 6: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/6.jpg)
www.percona.com6
RAID and SAN
![Page 7: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/7.jpg)
www.percona.com7
Using Many HDDs together
Caching Reads
Buffering Writes (Writeback Cache)
Better Sequential Read/Write speed
Better throughput at high concurrency
Higher IO latencies for uncached IO
![Page 8: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/8.jpg)
www.percona.com8
Flash Revolution
Use Flash chips instead of platte
rs
No moving part
s
No seeks
![Page 9: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/9.jpg)
www.percona.com9
NAND Flash
Cell
Page/Read Block
Erase Block
Write but no overwrite
Wears with writes (erases)
![Page 10: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/10.jpg)
www.percona.com10
Writing to the Flash
•Set all bits to “1111111…”Erase•Set some of the bits to 0: “0100111..”Write•Impossible. Do Erase, when Write
Change Zero to one
![Page 11: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/11.jpg)
www.percona.com11
Types of NAND Flash
From AnandTech:
![Page 12: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/12.jpg)
www.percona.com12
Flash Storage Design
Cache
Battery/Super Capacitor
Controller + Complex Firmware
Built-in Parallelism
![Page 13: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/13.jpg)
www.percona.com13
Flash Controller Tasks
Write wear leveling
Garbage collection
Error correction
Bad block mapping
Read scrubbing
Read disturb management
Encryption
![Page 14: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/14.jpg)
www.percona.com14
Flash Properties
Lots of IOs per device! (100K+)
Less random IO penalty
Writes more expensive than reads (but can be faster)
Limited by amount of writes
Limited retention
Concurrent execution on single device
Fast write acknowledgement (safe or not)
Can burst writes
![Page 15: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/15.jpg)
www.percona.com15
Flash Interface Designs
DIMM
PCI-E
SFF-8639
SATA/SAS
FC and Network
![Page 16: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/16.jpg)
www.percona.com16
Transitioning
AHCI NVMe
![Page 17: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/17.jpg)
www.percona.com17
AHCI vs NVMe
• Source: AnandTech.com
![Page 18: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/18.jpg)
www.percona.com18
Sandisk ULLtraDIMM
![Page 19: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/19.jpg)
www.percona.com19
HGST Virident
![Page 20: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/20.jpg)
www.percona.com20
Sandisk FusionIO
![Page 21: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/21.jpg)
www.percona.com21
Intel P3700
![Page 22: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/22.jpg)
www.percona.com22
Intel 730 (SATA)
![Page 23: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/23.jpg)
www.percona.com23
mSATA
![Page 24: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/24.jpg)
www.percona.com24
M.2 Interface
![Page 25: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/25.jpg)
www.percona.com25
Violin Memory
![Page 26: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/26.jpg)
www.percona.com26
“Consumer” vs “Enterprise”
Performance
Endurance
Durability
Retention
Encryption
![Page 27: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/27.jpg)
www.percona.com27
Not your HDD
All HDDs are the same; All SSDs are different
![Page 28: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/28.jpg)
www.percona.com28
Evaluation
Performance changes over time
Empty Space Matters
Complex internals
Watch stability carefully
![Page 29: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/29.jpg)
www.percona.com29
How Flash Fails
Clear write amount defined EOL (but often can handle a lot more)
One day… it’s gone
“Power Loss Protection”
Internal ECC and redundancy
![Page 30: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/30.jpg)
www.percona.com30
To RAID or not to RAID ?
More valuable for consumer grade
Watch for good Flash support
RAID controller logic may slow things down
Use a redundant array of inexpensive servers instead?
![Page 31: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/31.jpg)
www.percona.com31
Redundancy
Device internal redundancy
Hardware RAID
Software RAID
Filesystem “RAID”
![Page 32: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/32.jpg)
www.percona.com32
OS Support
Flash support is actively being improved
TRIM
Sparse Files
![Page 33: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/33.jpg)
www.percona.com33 www.percona.com
Flash And Databases
![Page 34: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/34.jpg)
www.percona.com34
Database History
Most have been designed in HDD time
Optimize for sequential IO
Count on cheap sequential writes
RAID, BBU to improve performance
![Page 35: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/35.jpg)
www.percona.com35
It’s time for Flash
Your OLTP Database should live on Flash
![Page 36: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/36.jpg)
www.percona.com36
But What Flash ?
Pick a flash type that is right for your application
![Page 37: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/37.jpg)
www.percona.com37
IO vs Memory
![Page 38: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/38.jpg)
www.percona.com38
Warmup
Much faster warmup times
Even if the database fits in memory, SSD might be justified
![Page 39: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/39.jpg)
www.percona.com39
Tolerate more IO bound load
• 5ms• Can do 20 IO/s for 100ms
response time (non parallel)HDD
• 0.1ms• Can do 1000 IO/s for 100ms
response time (non parallel)Flash
![Page 40: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/40.jpg)
www.percona.com40
Endurance
Might be a top consideration
![Page 41: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/41.jpg)
www.percona.com41
Endurance Math
• 4400GB/day over 5 Years• 1400MB/sec peak writes• 66 days at peak write
throughput
HGST FlashMax III 2200GB
• 72TB total life time writes• 400MB/sec write• 52 hours at peak write
throughput
Crucial M500 960GB
![Page 42: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/42.jpg)
www.percona.com42
Databases and Flash
How do we optimize databases to us
Flash best?
![Page 43: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/43.jpg)
www.percona.com43
“Torn Page” problem
Flash can avoid this with little cost due to internal design
FusionIO NVMFS (Atomic Writes)
Copy-on-Write File Systems• ZFS• BTRFS
Filesystem level data journaling less preferred• data=journal for EXT4
Skip-Innodb-double-write
![Page 44: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/44.jpg)
www.percona.com44
Fast IO Path
Bypass Caching O_DIRECT
Native Asynchronous IO
Efficient Checksuming
Innodb_checksum_algorithm=crc32
Innodb_flush_method=O_DIRECT
![Page 45: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/45.jpg)
www.percona.com45
IO Cost Accounting
Sequential vs Random IO balance
IO vs CPU Balance
Smaller page sizes might make sense• innodb_page_size=4K
![Page 46: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/46.jpg)
www.percona.com46
Less Pre-fetching
Most pre-fetched data must be used
Often best to try It out
![Page 47: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/47.jpg)
www.percona.com47
Less merging on flushing
Do not assume flushing multiple sequential dirty pages has same cost
Innodb_flush_neighbors=0
![Page 48: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/48.jpg)
www.percona.com48
Less Space on Disk
Innodb Compression (2x typical)
TokuDB Compression (5-10x typical)
Archiving data off OLTP System
![Page 49: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/49.jpg)
www.percona.com49
Less Writes on Flash
Hybrid Flash/SSD System
Transactional Logs, Other logs on the HDD with RAID and BBU
Small Temporary objects on tmpfs
Innodb_log_file_size=<LARGE>
![Page 50: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/50.jpg)
www.percona.com50
Logs on RAID can be fast
![Page 51: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/51.jpg)
www.percona.com51
Single Intel 730 Sysbench
![Page 52: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/52.jpg)
www.percona.com52
IOPS
![Page 54: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/54.jpg)
www.percona.com54
Is Flash Too Fast ?
• Multiple instances might scale better
![Page 55: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/55.jpg)
www.percona.com55
Other Thoughts
Host hardware and OS matter, especially with high end flash
Virtualization has higher relative overhead
Network higher relative overhead
![Page 56: SSD для вашей базы данных, Петр Зайцев (Percona)](https://reader035.vdocument.in/reader035/viewer/2022062303/5585ba3dd8b42a695a8b4c63/html5/thumbnails/56.jpg)
www.percona.com56 www.percona.com
Peter [email protected]
@PeterZaitsevhttps://www.linkedin.com/in/peterzaitsev
Thank You!