announcements - university of...
TRANSCRIPT
![Page 1: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/1.jpg)
12/7/16
1
Announcements
P4 graded: In Learn@UW; email 537-help@cs if problems
P5: Available - File systems
• Can work on both parts with project partner
• Watch videos; discussion section
• Part a : file system checker NOT in xv6 code base
Read as we go along!• Chapter 43
Persistence: Log-Structured FS (LFS)
Questions answered in this lecture:
Besides Journaling, how else can disks be updated atomically?Does on-disk log help performance of writes or reads?How to find inodes in on-disk log?
How to recover from a crash?How to garbage collect dead information?
UNIVERSITY of WISCONSIN-MADISONComputer Sciences Department
CS 537Introduction to Operating Systems
Andrea C. Arpaci-DusseauRemzi H. Arpaci-Dusseau
![Page 2: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/2.jpg)
12/7/16
2
File-System Case Studies
Local
- FFS: Fast File System
- ext3, ext4: Journaling File Systems
- LFS: Log-Structured File System; • Copy-On-Write (COW) (ZFS, btrfs)
Network
- NFS: Network File System
- AFS: Andrew File System
General Strategy for Crash Consistency
Never delete ANY old data, until ALL new data is safely on disk
Implication:At some point in time, all old AND all new data must be on disk
Two techniques popular in file systems:
1. journal make note of new info, then overwrite old info with new info in place
2. copy-on-write: write new info to new location, discard old info (update pointers)
![Page 3: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/3.jpg)
12/7/16
3
Review: Journal New,Overwrite In-Place
12
In-place file data
5 ... ...
Journal
Review: Journal New, Overwrite In-Place
12 5 10 ...
Journal
Imagine journal header describes in-place destinations
In-place file data
![Page 4: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/4.jpg)
12/7/16
4
Review: Journal New, Overwrite In-Place
12 5 10 7
Journal
Imagine journal commit block designates transaction complete
In-place file data
Review: Journal New, Overwrite In-Place
10
file data
5 10 7
Journal
Perform checkpoint to in-place data when transaction is complete
![Page 5: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/5.jpg)
12/7/16
5
Review: Journal New, Overwrite In-Place
10
file data
7 10 7
Journal
Review: Journal New, Overwrite In-Place
10
file data
7
Journal
Clear journal commit block to show checkpoint complete
![Page 6: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/6.jpg)
12/7/16
6
TODAY: Write New,Discard Old
12
file data
5 ... ...
Make a copy-on-write (COW)
TODAY: Write New, Discard Old
12
file data
5 10 ...
![Page 7: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/7.jpg)
12/7/16
7
TODAY: Write New, Discard Old
12
file data
5 10 7
TODAY: Write New, Discard Old
12
file data
5 10 7
![Page 8: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/8.jpg)
12/7/16
8
TODAY: Write New, Discard Old
file data
10 7
Obvious advantage?Only write new data once instead of twice
LFS Performance Goal
Motivation:• Growing gap between sequential and random I/O performance• RAID-5 especially bad with small random writes
Idea: use disk purely sequentially
Easy for writes to use disk sequentially – why?• Can do all writes near each other to empty space – new copy• Works well with RAID-5 (large sequential writes)
Hard for reads – why?• User might read files X and Y not near each other on disk• Maybe not be too bad if disk reads are slow – why?
• Memory sizes are growing (cache more reads)
![Page 9: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/9.jpg)
12/7/16
9
LFS Strategy
File system buffers writes in main memory until “enough” data• How much is enough?
• Enough to get good sequential bandwidth from disk (MB)
Write buffered data sequentially to new segment on disk
• Segment is some contiguous region of blocks
Never overwrite old info: old copies left behind
Big Picture
buffer:
disk:
![Page 10: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/10.jpg)
12/7/16
10
Big Picture
buffer:
disk:
Big Picture
buffer:
disk:
![Page 11: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/11.jpg)
12/7/16
11
Big Picture
buffer:
disk:
S1
Big Picture
buffer:
S0disk: S3S2
segments
![Page 12: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/12.jpg)
12/7/16
12
Data Structures (attempt 1)
What data structures from FFS can LFS remove?• allocation structs: data + inode bitmaps
What type of name is much more complicated?• Inodes are no longer at fixed offset• Use current offset on disk instead of table index for name
• Note: when update inode, inode name changes!!
S1S0 S3S2
D’I2 Dir I9 D
Attempt 1
root inode
file inode
file data
root directory entries
How to update Inode 9 to point to new D’ ???
Overwrite data in /file.txt
![Page 13: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/13.jpg)
12/7/16
13
D’I2 Dir I9 D
AttempT 1
Can LFS update Inode 9 to point to new D’?
NO! This would be a random write
Overwrite data in /file.txt
I2'Dr’I9'D’I2 Dir I9 D
Attempt 1
old new
Must update all structures in sequential order to log
Overwrite data in /file.txt
![Page 14: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/14.jpg)
12/7/16
14
Attempt 1: Problem w/ Inode Numbers
Problem: For every data update, must propagate updates all the way updirectory tree to root
Why?When inode copied, its location (inode name) changes
Solution:Keep inode names constant; don’t base inode name on offset
FFS found inodes with math. How in LFS?
I2'Dr’I9'D’I2 Dir I9 D
Data Structures (attempt 2)
What data structures from FFS can LFS remove?• allocation structs: data + inode bitmaps
What type of name is much more complicated?• Inodes are no longer at fixed offset• Use imap structure to map:
inode number => inode location on disk
![Page 15: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/15.jpg)
12/7/16
15
imap
Where to keep Imap?
S1S0disk: S3S2
segments
table of millions ofentries (4 bytes each)
Where can imap be stored???? Dilemma:1. imap too large to keep in memory2. don’t want to perform random writes for imap
Solution:Write imap in segmentsKeep pointers to pieces of imap in memory (crash? fix this later!)
imap: inode number => inode location on disk
Solution: Imap in Segments
S1S0disk: S3S2
segments
ptrs toimap piecesmemory:
Solution:Write imap in segmentsKeep pointers to pieces of imap in memory (crash? fix this later!)Keep recent accesses to imap cached in memory
Cached portion of imap
![Page 16: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/16.jpg)
12/7/16
16
imapinodedata
Example Write
…disk:
Solution:Write imap in segmentsKeep pointers to pieces of imap in memory (crash? Fix this later)Keep recent accesses to imap cached in memory
data inode root foo bar root foobitmap bitmap inode inode inode data data
create /foo/bar
(read)(read)
(read)(read)
readwrite
write
readwrite
write
Most data structures same in LFS as FFS!
Use imap in memory to find location of root and foo inodesUpdate imap on disk with new locations for foo and bar inodes
![Page 17: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/17.jpg)
12/7/16
17
Other Issues
Crashes
Garbage Collection
Crash Recovery
What data needs to be recovered after a crash?• Need imap (lost in volatile memory)
Naive approach?• Scan entire disk to reconstruct imap pieces. Slow!
Better approach?• Occasionally checkpoint to known on-disk location pointers to
imap pieces
How often to checkpoint?• Checkpoint often: random I/O• Checkpoint rarely: lose more data, recovery takes longer• Example: checkpoint every 30 secs
![Page 18: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/18.jpg)
12/7/16
18
Checkpoint
S1S0disk: S3S2
ptrs toimap piecesmemory:
checkpoint
after lastcheckpoint
tail after lastcheckpoint
Crash!
Reboot
S1S0disk: S3S2
checkpoint
ptrs toimap piecesmemory:
tail after lastcheckpoint
![Page 19: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/19.jpg)
12/7/16
19
Reboot
S1S0disk: S3S2
checkpoint
ptrs toimap piecesmemory: get pointers
from checkpoint
tail after lastcheckpoint
Reboot
S1S0disk: S3S2
checkpoint
ptrs toimap piecesmemory:
get pointers by scanning after tailà Roll forward
tail after lastcheckpoint
![Page 20: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/20.jpg)
12/7/16
20
Checkpoint Summary
Checkpoint occasionally (e.g., every 30s)
Upon recovery:
- read checkpoint to find most imap pointers and segment tail
- find rest of imap pointers by reading past tail
What if crash during checkpoint?
v2???
Checkpoint Strategy
Have two checkpoint regions
Only overwrite one checkpoint at a time
Use checksum/timestamps to identify newest checkpoint
S1S0disk: S3S2
writing
![Page 21: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/21.jpg)
12/7/16
21
v2v3
Checkpoint Strategy
S1S0disk: S3S2
Have two checkpoint regions
Only overwrite one checkpoint at a time
Use checksum/timestamps to identify newest checkpoint
???v3
Checkpoint Strategy
S1S0disk: S3S2
writing
Have two checkpoint regions
Only overwrite one checkpoint at a time
Use checksum/timestamps to identify newest checkpoint
![Page 22: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/22.jpg)
12/7/16
22
v4v3
Checkpoint Strategy
S1S0disk: S3S2
Have two checkpoint regions
Only overwrite one checkpoint at a time
Use checksum/timestamps to identify newest checkpoint
v4???
Checkpoint Strategy
S1S0disk: S3S2
writing
Have two checkpoint regions
Only overwrite one checkpoint at a time
Use checksum/timestamps to identify newest checkpoint
![Page 23: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/23.jpg)
12/7/16
23
v4v5
Checkpoint Strategy
S1S0disk: S3S2
Have two checkpoint regions
Only overwrite one checkpoint at a time
Use checksum/timestamps to identify newest checkpoint
Other Issues
Crashes
Garbage Collection
![Page 24: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/24.jpg)
12/7/16
24
What to do with old data?
Old versions of files -> garbage
Approach 1: garbage is a feature!• Keep old versions in case user wants to revert files later• Versioning file systems
• Example: Dropbox
Approach 2: garbage collection…
Garbage Collection
Need to reclaim space:
1. When no more references (any file system)
2. After newer copy is created (COW file system)
LFS reclaims segments (not individual inodes and data blocks)
- Want future overwites to be to sequential areas
- Tricky, since segments are usually partly valid
![Page 25: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/25.jpg)
12/7/16
25
FREEFREE
Garbage Collection
USEDUSEDdisk segments: USEDUSED
60% 10% 95% 35%
FREEUSED
Garbage Collection
USEDUSEDdisk segments: USEDUSED
60% 10% 95% 35% 95%
compact 2 segments to one
When move data blocks, copy new inode to point to itWhen move inode, update imap to point to it
![Page 26: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/26.jpg)
12/7/16
26
FREEUSED
Garbage Collection
USEDFREEdisk segments: FREEUSED
10% 95% 95%
release input segments
Garbage Collection
General operation:Pick M segments, compact into N (where N < M).
Mechanism:How does LFS know whether data in segments is valid?
Policy:Which segments to compact?
![Page 27: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/27.jpg)
12/7/16
27
Garbage Collection Mechanism
Is an inode the latest version?• Check imap to see if this inode location is pointed to• Fast!
Is a data block the latest version?• Scan ALL inodes to see if any point to this data
• Very slow!
How to track information more efficiently?• Segment summary lists inode and data offset corresponding
to each data block in segment (reverse pointers)
Block Liveness
:Ddisk: SS… …
am i alive?
…
![Page 28: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/28.jpg)
12/7/16
28
inode
Block Liveness
:Ddisk: SS… …
am i alive?
imap
…
inode
Block Liveness
:Ddisk: SS… …
am i alive?
imap
… D’
![Page 29: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/29.jpg)
12/7/16
29
inode
Block Liveness
:Ddisk: SS… …
am i alive?
imap
… D’
Nope!
inode
Block Liveness
:’(disk: SS… …
am i alive?
imap
… D’
Nope!
![Page 30: Announcements - University of Wisconsin–Madisonpages.cs.wisc.edu/~dusseau/Classes/CS537/Fall2016/Slides/lecture… · • Copy-On-Write (COW) (ZFS, btrfs) Network-NFS: Network File](https://reader035.vdocument.in/reader035/viewer/2022071210/6021f653d35d0942de7df879/html5/thumbnails/30.jpg)
12/7/16
30
Garbage Collection
General operation:Pick M segments, compact into N (where N < M).
Mechanism:How does LFS know whether data in segments is valid?[segment summary]
Policy:Which segments to compact?
• clean most empty first• clean coldest (segments changing least; wait longer for others)• more complex heuristics…
Conclusion
Journaling:Put final location of data wherever file system chooses (usually in a place optimized for future reads)
LFS:Puts data where it’s fastest to write(assume future reads cached in memory)
Other COW file systems: WAFL, ZFS, btrfs