Flashing Up the Storage Layer
I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008
Shimin ChenBig Data Reading Group
Motivation: Flash Disks: 64GB – 128GB SSDs available as
of Feb’08 Intel announced 80GB SSDs
Flash disks vs. magnetic disks Same I/O interface: logical 512B sectors No mechanical latency, I/O asymmetry, erase-before-
write: Random reads 10X faster than magnetic disks Random writes 10X slower than magnetic disks, esp
MLC
Exploit flash disks for storage?
Architecture
Flash disk as a cache for magnetic disk? Suboptimal for database workloads
because of write inefficiency
Flash disk and magnetic disk on the same level (This Paper)
ProblemStatement
Page migrations (Storage Manager)
Workload prediction Self-tuning
Page replacement (Buffer Manager)
Outline
Introduction Page placement Page replacement Experimental study Conclusion
Model
Random read/write costs of flash and magnetic disks Page migration decision is always made when a page
is in buffer pool Migration cost == write cost
The ideas are not new. The novel thing here is that logical I/Os are served by buffer pool. Only part of them are seen physically.
r, w: the cost of the current disk; r’, w’: the cost of the other disk
pg.C: a counter per page – the accumulated cost difference
Conservativeness Migration operation only after the cost
of migrating to and back Only physical operations on pages
3-competitive to optimal offline algorithm
Properties
Not conservative on migrations Based on logical operations
Hybrid Algorithm Idea:
Consider both physical and logical operations
More weight on physical ones
If a file has n pages, and b pages are cached in the buffer pool, then Prob_miss = 1 – b/n
Outline
Introduction Page placement Page replacement Experimental study Conclusion
Eviction Cost
Evicting a page: Dirty page incurs write cost Fetching a page back in the future
incurs read cost Cost:
Buffer Pool Organization
Sorted on timestamp
Sorted on cost of eviction
LRU
Impact of λ As λ increases:
Time segment decreases Cost segment increases Disk pages increases, flash pages decreases
Flash pages are evicted first, typically only found in time segment
Let Hm be the increase of disk hit rate, Mf be the increase of flash miss rate So we want
Outline
Introduction Page placement Page replacement Experimental study Conclusion
Experimental Setup Implementation:
Buffer manager, storage manager, B+trees for storing data
Machine: 2.26GHz Pentium4, 1.5GB RAM Debian linux, kernel 2.6.21 Two magnetic disks (300GB Maxtor
DiamondMax) 1 SSD (Samsung MLC 32GB) Data is stored on 1 disk + 1 SSD (both raw
devices)
Experimental Setup Cont’d Capacity of either disk is enough to hold all
data Metadata for files, pages, page mappings, and
free space are not modeled
B+tree is 140MB large, scattered across 1.4GB address space
Buffer pool is 20MB large
Raw Performance: 1 million 4KB random accesses
Impact of Using Both Disks
Conservative + LRU Query mix: read-only, write-only,
read/write Each set of queries executed 15
times
Read-Only
Write-Only
Mixed
Page Placement Algorithms
Infrequently changing workload
Frequently changing workload
Buffer Pool Replacement
Conclusion
Flash disk vs. magnetic disk Page migration and placement Page replacement Can be applied to databases and
file systems (?)
Outline
Introduction Page placement Page replacement Experimental study Conclusion