btrfs goals - oracle | oss.oracle.com - homemason/presentation/btrfs...data ordering • ext3 data...

22
Btrfs Goals General purpose filesystem that scales to very large storage Fast development cycle to attract developers and users early Feature focused, provide things other Linux filesystems cannot Administration focused, easy to run and very fault tolerant Perform well in a variety of workloads

Upload: phungdan

Post on 21-Apr-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Btrfs Goals

• General purpose filesystem that scales to very large storage

• Fast development cycle to attract developers and users early

• Feature focused, provide things other Linux filesystems cannot

• Administration focused, easy to run and very fault tolerant

• Perform well in a variety of workloads

Page 2: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Btrfs Features

• Extent based file storage• Copy on write metadata and data• Space efficient packing of small files• Space efficient indexed directories• Integrity checksumming• Writable snapshots• Efficient incremental backups• Multiple device support• Full back references for all types of metadata• Offline conversion from Ext3

Page 3: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Btrfs Status

• V0.16 recently released• Most releases happen 1-2 months apart• Generally usable in many workloads• Generally stable• Works against mainline kernels and 2.6.18 enterprise

kernels• Multi-device support functional

Page 4: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Key Storage

• Only three types of extents exist in Btrfs– Data– Btree– Super block

• Metadata is stored in key / value pairs in a btree• Items from multiple files and directories share the

same btree blocks• Special purpose btrees used for different functions

Page 5: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Btrfs Snapshots

• As efficient as directories on disk• Every transaction is a new unnamed snapshot• Every commit schedules the old snapshot for deletion• Snapshots use a reference counted COW algorithm to track blocks• Snapshots can be written and snapshotted again• COW leaves old data in place and writes the new data to a new location

Page 6: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 7: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 8: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Multi-device Support

• Devices are added into a pool of available storage• New logical address space is added with a specific

RAID configuration and data storage flags– System (used by the volume management code)– Metadata– Data– Raid0, raid1, raid10, single-spindle-dup

• Space is allocated from the storage pool in large chunks

• Devices can be mixed in size and speed

Page 9: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Multi-device Support

• All extent pointers and btree pointers use logical addresses

• Super block contains a bootstrap mapping with enough information to translate the addresses used by the chunk tree

• Single translation and IO path for all extents• Storage can be restriped or dynamically reconfigured• Storage pools use the same transaction subsystem

as the rest of the FS– Much lower overhead when reconfiguring storage

Page 10: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Multi-device Support

Page 11: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Btrfs Directories

• Indexed twice to improve directory read speeds• First index based on name for fast lookups• Second index based on directory sequence number

– Returns filenames in something close to inode number order– Tree defrag responsible for keeping inode number order

close to disk order

Page 12: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 13: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 14: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Data Ordering

• Ext3 data ordering writes all dirty data before each commit– Any fsync waits for all dirty data to reach disk

• Ext4 data ordering writes all dirty non-delalloc data before each commit– Much better than Ext3

• XFS updates inodes only after the data is written• Btrfs updates inodes, checksums, and extent pointers

after the data is written– Allows atomic writes– Makes transactions independent of data writeback

Page 15: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Synchronous Operations

• COW transaction subsystem is slow for frequent commits– Forces recow of many blocks– Forces significant amounts of IO writing out extent allocation

metadata

• Simple write ahead log added for synchronous operations on files or directories

• File or directory Items are copied into a dedicated tree – One log tree per subvolume– File back refs allow us to log file names without the directory

Page 16: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Synchronous Operations

• The log tree uses the same COW btree code as the rest of the FS

• The log tree uses the same writeback code as the rest of the FS, and uses the metadata raid policy.

• Commits of the log tree are separate from commits in the main transaction code.– fsync(file) only writes metadata for that one file– fsync(file) does not trigger writeback of any other data blocks

Page 17: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 18: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 19: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 20: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty
Page 21: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Mixed Fsync Workload

• 8 Processes creating, writing and fsyncing 20k files• 1 Process creating 30 copies of the linux kernel

sources in sequence

Kernel tree I/O Rate

83MB/s 15,680Ext4 122MB/s 1280Ext3 66.85MB/s 3200

Fsync Creates

Btrfs

Page 22: Btrfs Goals - Oracle | oss.oracle.com - Homemason/presentation/btrfs...Data Ordering • Ext3 data ordering writes all dirty data before each commit – Any fsync waits for all dirty

Btrfs Problems

• ENOSPC• Online, offline fsck• http://btrfs.wiki.kernel.org/index.php/Project_ideas