the write-anywhere-file- layout (wafl) - netapp...

36
The Write-Anywhere-File- Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout (WAFL) – p.1/36

Upload: lamhanh

Post on 08-Feb-2018

270 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

The Write-Anywhere-File-Layout(WAFL)

Ohad Rodeh

The Write-Anywhere-File-Layout (WAFL) – p.1/36

Page 2: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

IntroductionThis lecture is based on File System Design for an NFS FileServer Appliance Dave Hitz, James Lau, MichaelMalcolm. Proceedings of the USENIX Winter 1994Technical ConferenceThe WAFL design is used today in Network-Appliancefilers

The Write-Anywhere-File-Layout (WAFL) – p.2/36

Page 3: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Appliance for NFSNFS is Network File System1. A standard protocol to access a remote file-system2. Create/Delete file3. Read/WriteAn appliance is a special purpose device

The Write-Anywhere-File-Layout (WAFL) – p.3/36

Page 4: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Main ideasNew file-system idea1. Write-Anywhere-File-Layout (WAFL)2. Create snapshots efficiently3. Uses a copy-on-write technique4. Minimizes disk-space that snapshots consumeSnapshots are used to eliminate the need forfile-system consistency checks after an uncleanshutdown

The Write-Anywhere-File-Layout (WAFL) – p.4/36

Page 5: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Design goalsWAFL is to be used inside an NFS applianceTuned specifically for NFSIncreased write workload due to problematic cachingat clientWAFL should:1. Provide fast NFS service2. Support large file systems (tens of GB) that grow

dynamically as disks are added. This simplifiesmanagement, users don’t want multiple partitions.

3. Provide high performance while supporting RAID4. Support RAID in order to tolerate failed disks

gracefully5. Restart quickly after a crash. Traditional FSCK is

too slow.

The Write-Anywhere-File-Layout (WAFL) – p.5/36

Page 6: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

SnapshotsSnapshot: read-only copy of the entire file system.Why use snapshots?1. Users can access snapshots through NFS to

recover files that they have accidentally changed orremoved

2. System administrators can use snapshots to createbackups safely from a running system

3. WAFL uses Snapshots internally so that it canrecover quickly from crashes

The Write-Anywhere-File-Layout (WAFL) – p.6/36

Page 7: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

StructureWAFL is similar in many ways to FFSIt uses 4 KB blocks with no fragmentsThe i-node is similar to FFS, with some exceptions1. Contains 16 block pointers2. All the block pointers refer to blocks at the same

level3. I-nodes for files smaller than 64 KB use the 16

block pointers to point to data blocks4. I-nodes for files larger than 64 KB point to indirect

blocks which point to actual file data5. I-nodes for even larger files point to doubly indirect

blocks6. For very small files, data is stored in the i-node itself

The Write-Anywhere-File-Layout (WAFL) – p.7/36

Page 8: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Meta data filesWAFL stores meta-data in filesThere are three meta-data files1. I-node file, contains the i-nodes for the file system2. Block-map file, identifies free blocks3. I-node-map file, identifies free i-nodesThe term map is used instead of bit map becausethese files use more than one bit for each entry

The Write-Anywhere-File-Layout (WAFL) – p.8/36

Page 9: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Why use files to holdmeta-data?

Allows writing meta-data blocks anywhere on diskWrite-anywhere allows operating efficiently with RAID1. WAFL works with RAID-42. Scheduling multiple writes to the same RAID stripe

whenever possible3. Avoid the 4-to-1 short-write penaltyMakes it easy to increase the size of the file system onthe fly

The Write-Anywhere-File-Layout (WAFL) – p.9/36

Page 10: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Why use files to holdmeta-data? II

When a new disk is added1. Server automatically increase the sizes of the

meta-data files2. The system administrator can increase the number

of i-nodes in the file system manually if the defaultis too small

The Write-Anywhere-File-Layout (WAFL) – p.10/36

Page 11: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Tree of blocksA WAFL file system is best thought of as a tree ofblocks

The Write-Anywhere-File-Layout (WAFL) – p.11/36

Page 12: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Creating a snapshotWAFL creates a Snapshot by duplicating the rooti-node that describes the i-node fileWAFL avoids changing blocks in a Snapshot by writingnew data to new locations on diskSnapshot creation is very quick

The Write-Anywhere-File-Layout (WAFL) – p.12/36

Page 13: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Updates propagateup the tree

To write a block to a new location, the pointers in theblock’s ancestors must be updated, which requiresthem to be written to new locations as well.

The Write-Anywhere-File-Layout (WAFL) – p.13/36

Page 14: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Coalescing blockmodifications

Each NFS request causes modifications to manyblocksWAFL would be very inefficient if it wrote this manyblocks for each NFS write request.Instead, gather up many hundreds of NFS requestsbefore scheduling a write episode.During a write episode:1. Allocate disk space for all the dirty data in the cache2. Schedule the required disk I/O3. Commonly modified blocks, such as indirect blocks

and blocks in the i-node file, are written only onceper write episode instead of once per NFS request.

The Write-Anywhere-File-Layout (WAFL) – p.14/36

Page 15: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Consistency andrecovery

WAFL creates checkpoints once every 10 secondsA checkpoint is a snapshot that is not accessible tousersBetween checkpoints:1. NFS requests are recorded in the log2. The log is on NVRAM, so access is fast3. Blocks that are in use are never overwritten4. Ensures that the previous checkpoint remains

unchanged

The Write-Anywhere-File-Layout (WAFL) – p.15/36

Page 16: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Consistency andrecovery II

In case of crash:1. Go to latest checkpoint2. Replay the logFile system state advances atomically betweencheckpoints

The Write-Anywhere-File-Layout (WAFL) – p.16/36

Page 17: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Continued operationat checkpoint

WAFL divides the NVRAM into two separate logsWhen one log gets full, WAFL switches to the other logand starts writing a consistency point to store thechanges from the first log safely on disk.Scheduling a consistency point every 10 secondsshould, in most cases, avoid overflowing the log

The Write-Anywhere-File-Layout (WAFL) – p.17/36

Page 18: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Why use logicallogging

Logging NFS requests to NVRAM is better than usingNVRAM to cache writes at the disk driver layerProcessing an NFS request and caching the resultingdisk writes generally takes much more NVRAM thansimply logging the information required to replay therequest.Examples:

Write 4KB to a file, changes:1. File-attributes (mtime)2. Data block3. Indirect-blockRename

The Write-Anywhere-File-Layout (WAFL) – p.18/36

Page 19: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Why use logicallogging II

NVRAM also allows quick response to NFS requestsWith a typical mix of NFS operations, WAFL can storemore than 1000 operations per megabyte of NVRAM.

The Write-Anywhere-File-Layout (WAFL) – p.19/36

Page 20: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Why optimize forwrites?

Write performance is especially important for networkfile servers.As read caches get larger at both the client and server,writes begin to dominate the I/O subsystemThis effect is especially pronounced with NFS whichallows very little client-side write caching.Result: the disks on an NFS server may have 5 timesas many write operations as reads

The Write-Anywhere-File-Layout (WAFL) – p.20/36

Page 21: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Write allocationWAFL can write any file system block (except the onecontaining the root i-node) to any location on disk.

1. In FFS, meta-data is kept at fixed locations2. Prevents FFS from optimizing writes by, for

example, putting both the data for a newly updatedfile and its i-node right next to each other on disk

The Write-Anywhere-File-Layout (WAFL) – p.21/36

Page 22: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Write allocation IIWAFL can write blocks to disk in any order1. FFS writes blocks to disk in a carefully determined

order so that fsck(8) can restore file systemconsistency after an unclean shutdown.

2. WAFL’s constraint: must write all the blocks in a newconsistency point before it writes the root i-node forthe consistency point.

The Write-Anywhere-File-Layout (WAFL) – p.22/36

Page 23: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Delayed allocationWAFL gathers up hundreds of NFS requests beforescheduling a consistency pointThen it allocates blocks for all requests in theconsistency point at onceDeferring write allocation1. Improves the latency of NFS operations by

removing disk allocation from the processing pathof the reply

2. Avoids wasting time allocating space for blocks thatare removed before they reach disk

The Write-Anywhere-File-Layout (WAFL) – p.23/36

Page 24: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Allocation heuristicsImprove RAID performance by writing to multipleblocks in the same stripeReduce seek time by writing blocks to locations thatare near each other on diskReduce head-contention when reading large files byplacing sequential blocks in a file on a single disk in theRAID array

The Write-Anywhere-File-Layout (WAFL) – p.24/36

Page 25: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Allocation mapMost file systems1. Keep track of free blocks using a bit map with one

bit per disk block2. If the bit is set, then the block is in useThis technique does not work for WAFL because manysnapshots can reference a block at the same time

The Write-Anywhere-File-Layout (WAFL) – p.25/36

Page 26: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Allocation map IIWAFL:1. block-map file contains a 32-bit entry for each 4 KB

disk block2. Bit 0 is set if the active file system references the

block3. Bit 1 is set if the first Snapshot references the block,

etc.4. A block is in use if any of the bits in its block-map

entry are set

The Write-Anywhere-File-Layout (WAFL) – p.26/36

Page 27: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Example lifetime of ablock

The Write-Anywhere-File-Layout (WAFL) – p.27/36

Page 28: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Create a snapshotThe challenge: Avoid locking out incoming NFSrequests.New NFS requests may need to change cached datathat is part of the Snapshot and which must remainunchanged until it reaches diskThe trivial solution:1. Suspend NFS processing2. Write the Snapshot3. Resume NFS processingHowever, writing a Snapshot can take over a secondToo long for an NFS server to stop respondingA consistency point Snapshot is created every 10seconds

The Write-Anywhere-File-Layout (WAFL) – p.28/36

Page 29: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Keeping snapshotdata self consistent

Mark all the dirty data in the cache as IN_SNAPSHOTDuring Snapshot creation:1. Data marked IN_SNAPSHOT must not be modified2. Data not marked IN_SNAPSHOT must not be

flushed to disk

The Write-Anywhere-File-Layout (WAFL) – p.29/36

Page 30: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Keeping snapshotdata self consistent II

NFS requests can1. Read all file system data2. Modify data that isn’t IN_SNAPSHOTProcessing for requests that need to modifyIN_SNAPSHOT data must be deferredTo minimize the delay for these requests,IN_SNAPSHOT data must be flushed as quickly aspossible

The Write-Anywhere-File-Layout (WAFL) – p.30/36

Page 31: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

FlushingIN_SNAPSHOT datafast

Allocate disk space for all files with IN_SNAPSHOTblocks

1. I-nodes are cached in(a) A special in-core i-node cache(b) Disk-buffers

2. After allocation, copy i-nodes from in-core cache todisk buffers

3. Allows continued operation on in-core i-nodes4. Does not require actual IO

Update the block-map file: for each block-map entrycopy the bit for the active file system to the bit for thenew Snapshot

The Write-Anywhere-File-Layout (WAFL) – p.31/36

Page 32: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

FlushingIN_SNAPSHOT II

Write all IN_SNAPSHOT disk buffers in cache to theirnewly-allocated locations on disk.As soon as a particular buffer is flushed, any NFSrequests waiting to modify it can be restarted.Duplicate the root i-node1. Create an i-node that represents the new Snapshot2. Turn the root i-node’s IN_SNAPSHOT bit off3. The new Snapshot i-node must not reach disk until

after all other blocks in the Snapshot have beenwritten

The Write-Anywhere-File-Layout (WAFL) – p.32/36

Page 33: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Creating a snapshotOnce the new Snapshot i-node has been written1. No more IN_SNAPSHOT data exists in cache2. Any NFS requests that are still suspended can be

processedUnder normal loads the four steps can be performed inless than a second.Step (1) can generally be done in just a fewhundredths of a second, and once WAFL completes it,very few NFS operations need to be delayed

The Write-Anywhere-File-Layout (WAFL) – p.33/36

Page 34: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

Deleting a snapshotDeleting a Snapshot is trivial1. Zero the root i-node representing the Snapshot2. Clear the bit representing the Snapshot in each

block-map entry

The Write-Anywhere-File-Layout (WAFL) – p.34/36

Page 35: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

PerformancePerformance comparison of several file-system from1994WAFL has good performance1. Uses less file-systems (8 here)2. Other file-systems are not optimized for NFS

The Write-Anywhere-File-Layout (WAFL) – p.35/36

Page 36: The Write-Anywhere-File- Layout (WAFL) - NetApp …community.netapp.com/fukiw75442/attachments/fukiw75442/data-onta… · The Write-Anywhere-File-Layout (WAFL) Ohad Rodeh The Write-Anywhere-File-Layout

SummaryWAFL is a very interesting file-system1. Log-structure without a cleaner2. Appliance philosophy3. Optimized for writes4. NVRAM for log5. Cheap snapshotsWAFL has been successfully used byNetwork-Appliance for the past 10 yearsHas set the industry bar in1. NFS serving2. Snapshot performance and ease of useThe Write-Anywhere-File-Layout (WAFL) – p.36/36