file systems part 5 - brown universitycs.brown.edu/courses/cs167/lectures/18file5x.pdf · operating...
TRANSCRIPT
![Page 1: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/1.jpg)
Operating Systems In Depth XVIII–1 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
File Systems Part 5
![Page 2: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/2.jpg)
Operating Systems In Depth XVIII–2 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Beyond Disks: Flash
Pro• Flash block ≈ file-system
block• Random access• Low power• Vibration-resistant
Con• Limited lifetime• Writes can be expensive• Cost more than disks
– 1TB SSD: $164.99- x4.89 roughly eight years ago
– 3TB disk: $47.50- x2.67 roughly eight years ago
![Page 3: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/3.jpg)
Operating Systems In Depth XVIII–3 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Flash Memory
• Two technologies– nor
- byte addressable– nand
- page addressable• Writing
– newly “erased” block is all ones– “programming” changes some ones to zeroes
- per byte in nor; per page in nand (multiple pages/block)
- to change zeroes to ones, must erase entire block- can erase no more than ~100k times/block
![Page 4: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/4.jpg)
Operating Systems In Depth XVIII–4 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Coping
• Wear leveling– spread writes (erasures) across entire drive
• Flash translation layer (FTL)– specification from 1994– provides disk-like block interface– maps disk blocks to flash blocks
- mapping changed dynamically to effect wear-leveling
![Page 5: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/5.jpg)
Operating Systems In Depth XVIII–5 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Flash with FTL
• Which file system?– FAT32 (sort of like S5FS, but from Microsoft)– NTFS– FFS– Ext3
• All were designed to exploit disks– much of what they do is irrelevant for flash
![Page 6: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/6.jpg)
Operating Systems In Depth XVIII–6 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
A Problem Case (1)
Free
Free
Modified
![Page 7: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/7.jpg)
Operating Systems In Depth XVIII–7 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Free
Free
A Problem Case (2)
Free
Free
Free
Free
Modified
1) Copy2) Erase3) Copy and modify
![Page 8: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/8.jpg)
Operating Systems In Depth XVIII–8 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Free
Free
Trimming
Modified
1) Copy2) Erase3) Copy and modify
![Page 9: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/9.jpg)
Operating Systems In Depth XVIII–9 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Flash without FTL
• Known as memory technology device (MTD)– software wear-leveling– perhaps other tricks
![Page 10: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/10.jpg)
Operating Systems In Depth XVIII–10 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
JFFS and JFFS2
• Journaling flash file system– log-based: no journal!
- each log entry contains inode info and some data
- garbage collection copies info out of partially obsoleted blocks, allowing block to be erased
- complete index of inodes kept in RAM• entire file system must be read when
mounted
![Page 11: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/11.jpg)
Operating Systems In Depth XVIII–11 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
UBI/UBIFS
• UBI (unsorted block images)– supports multiple logical volumes on one flash
device– performs wear-leveling across entire device– handles bad blocks
• UBIFS– file system layered on UBI– it really has a journal (originally called JFFS3)– file map kept in flash as B+ tree– no need to scan entire file system when
mounted
![Page 12: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/12.jpg)
Operating Systems In Depth XVIII–12 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Flash as Part of the Hierarchy
• Flash as log device– aggregate write throughput sufficient, but
latency is bad– augment with DRAM and a “super-capacitor”
• Flash as cache– large level-2 cache
- integrated into ZFS- can use cheaper (slower) disks with no
loss of performance• reduced power consumption
![Page 13: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/13.jpg)
Operating Systems In Depth XVIII–13 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Apple Fusion Drives (FD)
• SSD used along with hard drive• Implemented in the LVM
– total capacity is sum of disk and SSD capacities
- works with both HFS+ and ZFS– SSD used to buffer all incoming writes– data is moved from disk to SSD if used
sufficiently often- migration happens in background
![Page 14: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/14.jpg)
Operating Systems In Depth XVIII–14 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
FD Observations
• Implementation is in the LVM, i.e., below the file system
– all decisions based on block access• 4GB available on SSD for writes
– all writes go to SSD while there’s space– otherwise go to HDD
• Frequent reads trigger promotion to SSD• Data transferred between SSD and HDD in
units of 128KB
![Page 15: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/15.jpg)
Operating Systems In Depth XVIII–15 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
FD Write
if !onSSD(block) {if SSDfreeSpace > 0 {
remap(block)writeOnSSD(block)
} else {writeOnHDD(block)
}} else {
writeOnSSD(block)}
![Page 16: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/16.jpg)
Operating Systems In Depth XVIII–16 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
FD Read
Update_Usage(block)if onSSD(block)
readFromSSD(block)else
readFromHDD(block)
![Page 17: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/17.jpg)
Operating Systems In Depth XVIII–17 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
FD Background Activities
when (accessThresholdReached(block)) {if SSDfreeSpace > 0 {
remap(block)writeOnSSD(block)
}}
when(SSDFreeSpace < 4GB) {for each infrequent block {
remap(block)writeOnHDD(block)
}}
![Page 18: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/18.jpg)
Operating Systems In Depth XVIII–18 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
FD Block Map
• Vital data structure• Kept up to date on SSD• Perhaps backed up on HDD
![Page 19: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/19.jpg)
Operating Systems In Depth XVIII–19 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Case Studies
• NTFS• WAFL• ZFS
![Page 20: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/20.jpg)
Operating Systems In Depth XVIII–20 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
NTFS
• “Volume aggregation” options– spanned volumes– RAID 0 (striping)– RAID 1 (mirroring)– RAID 5– snapshots
![Page 21: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/21.jpg)
Operating Systems In Depth XVIII–21 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Backups
• Want to back up a file system– while still using it
- files are being modified while the backup takes place
- applications may be in progress — files in inconsistent states
• Solution– have critical applications quickly reach a safe
point and pause– snapshot the file system– resume applications– back up the snapshot
![Page 22: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/22.jpg)
Operating Systems In Depth XVIII–22 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Windows Snapshots
normalapplications
backupprogram
C volume ShadowC volume
0 1′ 2 3 4′ 5 6′ 7′ 1 4 6 7
File System
C Drive Shadow C Drive
Snapshot Driver
![Page 23: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/23.jpg)
Operating Systems In Depth XVIII–23 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
NTFS File Records
Name Standardattributes Object ID Data stream
Name Standardattributes Object ID Data streamProperties
stream
Extent 3
Extent 2
Extent 1
![Page 24: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/24.jpg)
Operating Systems In Depth XVIII–24 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Additional NTFS Features
• Data compression– run-length encoding of zeroes– compressed blocks
• Encrypted files
![Page 25: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/25.jpg)
Operating Systems In Depth XVIII–25 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
WAFL
• Runs on special-purpose OS– machine is dedicated to being a filer– handles both NFS and CIFS requests
• Utilizes shadow paging and log-structured writes
• Provides snapshots
![Page 26: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/26.jpg)
Operating Systems In Depth XVIII–26 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
WAFL and RAID
Data blocks
Parity blocks
Data from oneconsistency point
![Page 27: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/27.jpg)
Operating Systems In Depth XVIII–27 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Consistency Points … and Beyond
• Consistency points taken every ~10 seconds– too relaxed for many applications
- NFS- databases
• Solution …
(battery-backed-up RAM)(a.k.a. non-volatile RAM (NVRAM))
![Page 28: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/28.jpg)
Operating Systems In Depth XVIII–28 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Snapshots
• Periodic snapshots kept of file system– made easy with shadow paging
![Page 29: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/29.jpg)
Operating Systems In Depth XVIII–29 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Taking a Snapshot (Before)
Root
Inode file indirect blocks
Inode file data blocks
Regular file indirect blocks
Regular file data blocks
![Page 30: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/30.jpg)
Operating Systems In Depth XVIII–30 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Taking a Snapshot (After)
Root
Inode file indirect blocks
Inode file data blocks
Regular file indirect blocks
Regular file data blocks
Snapshot Root
![Page 31: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/31.jpg)
Operating Systems In Depth XVIII–31 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Paranoia
• You think your files are safe simply because they’re on a RAID-4 or RAID-5 system …
– power failure at inopportune moment- parity is irreparably wrong
– obscure bug in controller firmware or OS- data is garbage (but with correct parity!)
– sysadmin accidentally scribbled on one drive- (profuse apologies … )
– out of disk space- must restructure 4TB file system
– out of address space- 264 isn’t as big as it used to be
![Page 32: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/32.jpg)
Operating Systems In Depth XVIII–32 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Partial Writes
Data blocks
Parity blocks
![Page 33: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/33.jpg)
Operating Systems In Depth XVIII–33 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Small Writes
Data blocks
Parity blocks
Writing these:
Requires reading then writing this:
![Page 34: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/34.jpg)
Operating Systems In Depth XVIII–34 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Hardware RAID
Data blocks
Parity blocks
RAIDController
![Page 35: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/35.jpg)
Operating Systems In Depth XVIII–35 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Adding a Disk (1)
Disk Disk
LVMDisk
?
![Page 36: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/36.jpg)
Operating Systems In Depth XVIII–36 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Disk
Adding a Disk (2)
Disk Disk
LVM
?
Disk Disk
LVM
![Page 37: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/37.jpg)
Operating Systems In Depth XVIII–37 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
ZFSThe Last (?!) Word in File Systems
![Page 38: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/38.jpg)
Operating Systems In Depth XVIII–38 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
ZFS Layers
Disk Disk Disk Disk Disk
Storage Pool
Data Management
![Page 39: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/39.jpg)
Operating Systems In Depth XVIII–39 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Enter ZFS
ZFS POSIX Layer(ZPL)
Data Management Unit(DMU)
Storage Pool Allocator(SPA)
Device Driver
VFSVnode operations
<dataset, object, offset>
<data virtual address>
<physical device, offset>
128-bit addresses!
264 objects;each up to 264 bytesProvides
transactions on objects
Maps virtual blocks to disks and
physical blocks
![Page 40: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/40.jpg)
Operating Systems In Depth XVIII–40 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Shadow-Page Tree(with a twist …)
überblock
pointerchecksum
ditto blocks
![Page 41: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/41.jpg)
Operating Systems In Depth XVIII–41 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Storage Pool Allocator
Data Management Unit(DMU)
Storage Pool AllocatorMirroring, spanning, or RAID
![Page 42: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/42.jpg)
Operating Systems In Depth XVIII–42 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
RAID-ZSoftware Dynamic Striping
a1 a2 a3 ap1-3 b1 b2
b3 b4 bp1-4 c1 c2 cp1-2
d1 d2 d3 d4 d5 dp1-5
d6 d7 dp6-7
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
![Page 43: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/43.jpg)
Operating Systems In Depth XVIII–43 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Quiz
Compared with RAID 4, which of the following would be more time-consuming with RAID-Z?a) adding a diskb) replacing a crashed diskc) bothd) neither
![Page 44: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/44.jpg)
Operating Systems In Depth XVIII–44 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
RAID-ZAdding a Disk
a1 a2 a3 ap1-3 b1 b2
b3 b4 bp1-4 c1 c2 cp1-2
d1 d2 d3 d4 d5 dp1-5
d6 d7 dp6-7
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
Disk 7
e1 e2 e3 e4
e5 e6 e1-6 e7 e8 e9 e7-9
![Page 45: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/45.jpg)
Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Scenarios
• Power failure at inopportune moment– “live data” is not modified– single lost write can be recovered
• Obscure bug in controller firmware or OS– detected by checksum in pointer
• Sysadmin accidentally scribbled on one drive– detected and repaired
• Out of disk space– add to the pool; SPA will cope
• Out of address space– 2128 is big
- 1 address per cubic yard of a sphere bounded by the orbit of Neptune
![Page 46: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/46.jpg)
Operating Systems In Depth XVIII–46 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
And There’s More …
• Adaptive replacement cache• Advanced prefetching
![Page 47: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/47.jpg)
Operating Systems In Depth XVIII–47 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
LRU Caching
• LRU cache holds n least-recently-used disk blocks
– working sets of current processes• New process reads n-block file sequentially
– cache fills with this file’s blocks– old contents flushed– new cache contents never accessed again
![Page 48: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/48.jpg)
Operating Systems In Depth XVIII–48 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
(Non-Adaptive) Solution
• Split cache in two– half of it is for blocks that have been
referenced exactly once– half of it is for blocks that have been
referenced more than once• Is 50/50 split the right thing to do?
![Page 49: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/49.jpg)
Operating Systems In Depth XVIII–49 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Adaptive Replacement Cache
……
LRU LRUn
n n
list t1 list t2
list b1 list b2
……
t1 ; b1:LRU list of blocks referenced oncet1 list (most recently used) contain contentsb1 list (least recently used) contain just references
t2 ; b2:LRU list of blocks referenced more than oncet2 list (most recently used) contain contentsb2 list (least recently used) contain just references
![Page 50: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/50.jpg)
Operating Systems In Depth XVIII–50 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Adaptive Replacement Cache
……
LRU LRUn
n n
list t1 list t2
list b1 list b2
……
cache miss:if t1 is full
evict LRU(t1) and make it MRU(b1)referenced block becomes MRU(t1)
![Page 51: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/51.jpg)
Operating Systems In Depth XVIII–51 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Adaptive Replacement Cache
……
LRU LRUn
n n
list t1 list t2
list b1 list b2
……
cache hit:if in t1 or t2, block becomes MRU(t2)otherwise
if block is referred to by b1, increase t1 space at expense of t2otherwise
increase t2 space at expense of t1if t1 is full, evict LRU(t1) and make it MRU(b1)if t2 is full, evict LRU(t2) and make it MRU(b2)insert block as MRU(t2)
![Page 52: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/52.jpg)
Operating Systems In Depth XVIII–52 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
Prefetch
• FFS prefetch– keeps track of last block read by each process– fetches block i+1 if current block is i and
previous was i-1– chokes on
- diff file1 file2
![Page 53: File Systems Part 5 - Brown Universitycs.brown.edu/courses/cs167/lectures/18File5X.pdf · Operating Systems In Depth XVIII–45 Copyright © 2020 Thomas W. Doeppner. All rights reserved](https://reader033.vdocument.in/reader033/viewer/2022051913/6004bf924b1e850d1d07d1a7/html5/thumbnails/53.jpg)
Operating Systems In Depth XVIII–53 Copyright © 2020 Thomas W. Doeppner. All rights reserved.
zfetch
• Tracks multiple prefetch streams• Handles four patterns
– forward sequential access– backward sequential access– forward strided access
- iterating across columns of matrix stored by columns
– backward strided access