fast 15’ authors: chanman lee, dongho sim, jooyoung hwang, and sangyeun cho, samasung eletronics...
TRANSCRIPT
FAST 15’Authors: Chanman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho, Samasung Eletronics Co.
F2FS: A New File System for Flash Storage
Presented by YouJune Go
2015. 08. 18.System Software Laboratory
Department of CSE@POSTECH
Introduction• NAND flash memory has been widely used in various devices.
• Server system started utilizing flash devices as their primary storage.
• BUT, there are several limitations on flash memory.• erase-before-write, write on erased block sequentially and
limited write cycles per erase block.
• Random writes are not good for flash storage devices.• Free space fragmentation.
• Sustained random write performance degrades.• Lifetime reduction.
• Sequential write oriented file system• Log structured file system, copy-on-write file system.
Key Design Considerations
• Flash-friendly on-disk layout.• Cost-effective index structure.• Multi-head logging.• Adaptive logging.• Fsync acceleration with roll-forward recovery.
• Flash awareness• All the file system metadata are located together for
locality.• File system cleaning is done in a unit of section(FTL’s GC
unit).
• Cleaning Cost Reduction• Multi-head logging for hot/cold data separation.
Flash-friendly On-disk Layout
• Superblock• Partition information, F2FS parameters(not changeable)
• Check point • file system status, bitmaps for valid NAT/SIT sets, orphan
inode list, summary entries of active segment.• Segment Information Table(SIT)
• valid segments and bitmap information in Main area. • Node Address Table(NIT)
• block address table to locate all the node blocks stored in Main area.
• Segment Summary Area(SSA) • summary entries representing the owner information of all
blocks in the Main area(parent inode number and its node/data offset).
• Main Area : Block is typed to be node or data.• Node block stores inode or indices of data blocks(not data).• Data block contains either directory or user file data.
On-disk layout in detail
• Update propagation issue; wandering tree problem.• One log.
LFS index structure
SB
CP
Inode map
Segment Usage
Segment Summary
Inode fordirectory
Inode forregualr file
Directory data
File data
IndirectPointer block
DirectPointer block
File data… …
• Restrained update propagation: node address translation method
• Multi-head log
F2FS index structure
SB
CP
Segment Usage
Segment Summary
Inode fordirectory
Inode forregualr file
Directory data
File data
IndirectPointer block
DirectPointer block
File data… …
NAT
File look-up operation example
• Assuming a file “/dir/file”
inodelocatio
n
NAT
root 100
root inode10
0
Search dentry
named dir and get
inode num
dir 200 dir inode200
Search dentry
named file and get
inode num
file 300
file inode300
Get an actual data
of file consquently
Multi-head logging• Data temperature classification.
• Node > Data• Direct node > Indirect node• Directory > User file
• Separation of multi-head logs in NAND flash.• Zone-aware log allocation for set-associative FTL mapping.• Multi-stream interface.
• Cleaning is a process to reclaim scattered and invalidated blocks for free segments for further logging.• occurred when capacity is filled up. • done in the unit of a section.
• Triggered in Foreground and Background process.
• Cleaning procedure(1) Victim selection: get a victim section through referencing Segment Info Table(SIT).
Greedy algorithm for foreground cleaning job. Cost-benefit algorithm for background cleaning job.
(2) Valid block check: load parent index structures of there-in data identified from Segment Summary Area.(3) Migration: move valid blocks to another spaces in Main area. (4) Mark victim section as “pre-free”
Pre-free sections are freed after the next checkpoint is made.
Cleaning
• Foreground cleaning• identify valid block quickly by using validity bitmaps in SIT
information.• after identification, F2FS retrieves parent node blocks containing
their indices from the SSA information and moves them to free logs.
• Background cleaning• does not issue actual I/O to migrate valid blocks, instead F2FS
loads the blocks into page cache and marks them as dirty (lazy migration).
• background cleaning is not kicked in when foreground cleaning process is in progress.
Foreground and Background Cleaning
• To reduce cleaning cost at highly aged conditions, F2FS changes write policy dynamically. • Normal logging (Append logging, logging to clean segments)
Need cleaning operations if there is no free segment. Cleaning causes mostly random read and sequential writes.
• Threaded logging (logging to dirty segments) Reuse invalid blocks in dirty segments. No need cleaning. Cause random writes.
• Switching between normal logging and threaded logging depending on predefined value k(ratio of clean segments).• If k > 5%, then normal logging. Otherwise, threaded logging.
Adaptive Logging
segment
Threaded logging writes data into invalid blocks in segment
• Checkpointng• implemented to provide a consistent recovery point from sudden
poweroff or system crash.• events like sync, umount and foreground cleaning, F2FS triggers
checkpoint procedures. (1) All dirty node and dentry blocks in the page cache are flushed. (2) It suspends ordinary writing activities. (3) File system metadata such as NAT, SIT and SSA are written to their dedicated areas on the disk. (4) Finally, F2FS writes a checkpoint pack, consisting of the following informa- tion.
Header and Footer NAT and SIT bitmaps NAT and SIT journals Summary blocks of active segments Orphan blocks
Sudden Power Off Recovery
• After a sudden power-off, F2FS rolls back to the latest consistent checkpoint.• Maintains shadow copy of checkpoint, NAT, SIT blocks.• Recovers the latest checkpoint.• Keeps NAT/SIT journal in checkpoint to avoid NAT, SIT writes.
Roll-back Recovery
Main areaSB
CP
NAT SIT SSA
0 1
0 1
NAT/SIT journalin
g
• Fsync handling• On fsync, checkpoint is not necessary.• Only direct node blocks and data file are written with fsync mark.
• Roll-forward recovery procedure* Denote N refers to the log position of the last stable checkpoint.(1) F2FS collects the direct node block having the special flag
located in N+n (n: the number of blocks updated since the last checkpoint).
(2) then loads the most recently written node blocks, named N-n, into the page cache.
(3) And compares the data indices in between N-n and N+n.(4) Finally, if F2FS detects different data indices, then it refreshes
the cached node block with the new indices stored in N+n and finally marks them as dirty.
Roll-forward Recovery
Evaluation• Experimental setup
• Mobile and server systems.• Performance comparison with ext4, btrfs, nilfs2.• Values in parentheses meaning seq-rd, seq-wr,
rand-rd, rand-wr in MB/s
Source: the paperb
Mobile Benchmark• In F2FS, more than 90% of writes are sequential.• F2FS reduces write amount per fsync by using roll-forward
recovery. • BTRFS and NILFS2 performed poor than ext4
• BTRFS: heavy indexing overheads, NILFS2: periodic data flush• For Iozone-RW, BTRFS, NILFS2 write 15%, 41% more I/Os than Ext4,
repectively.• For Iozone-RR, BTRFS has 50% more I/Os than other file systems.
Source: the paperb
Server Benchmark• Performance gain of F2FS over ext4 is more on SATA SSD than
on PCIe SSD• Varmail: 2.5x on the SATA SSD and 1.8x on the PCIe SSD• Oltp: 16% on the SATA SSD and 13% on the PCIe SSD
• Discard size matters in SATA SSD due to interface overhead.• When using small discard(256KB) for F2FS, fileserver performance is
degraded by 18%.
Source: the paperb
Multi-head Logging• Using more logs gives better hot and cold data separation• 2 logs: node, data• 4 logs: hot node, warm/cold node, hot data, warm/cold data.
Source: the paperb
Adaptive Logging Performance• Adaptive logging gives graceful performance degradation under
highly aged volume conditions.• Fileserver test on SATA SSD(94% util)
• Sustained performance improvement: 2x/3x over ext4/btrfs• Iozone test on eMMC(100% util)
• Sustained performance is similar to ext4
Source: the paperb
Conclusion• F2FS features
• Flash friendly on-disk layout align FS GC unit with FTL GC unit,• Cost-effective index structure restrain write propagation,• Multi-head logging cleaning cost reduction,• Adaptive logging graceful performance degradation in aged
condition,• Roll-forward recovery fsync acceleration.
• F2FS shows performance gain over other Linux file systems.• 3.1x(iozone) and 2x (SQLite) speedup over ext4,• 2.5x(SATA SSD) and 1.8x(PCIe SSD) speedup over
ext4(varmail)
Thank you, Q&A
BACKUP SLIDES