file systems, part 2 - caltech computingcourses.cms.caltech.edu/cs124/lectures/cs124lec25.pdf ·...

27
FILE SYSTEMS, PART 2 CS124 – Operating Systems Fall 2017-2018, Lecture 24

Upload: dangngoc

Post on 06-Jul-2018

256 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

FILE SYSTEMS, PART 2CS124 – Operating SystemsFall 2017-2018, Lecture 24

Page 2: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Last Time: File Systems• Introduced the concept of file systems• Explored several ways of managing the contents of files

• Contiguous allocation• Extents• Linked allocation• File allocation tables• Started discussing indexed allocation

• Different approaches have different strengths/weaknesses• Performance of sequential access and direct access• Susceptibility to external/internal fragmentation• Susceptibility to data fragmentation

2

Page 3: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation• Indexed allocation achieves the benefits of linked

allocation while also being very fast for direct access• Files include indexing information to allow for fast access

• Each file effectively has its own file-allocation table optimized for both sequential and direct access

• This information is usually stored separate from the file’s contents, so that programs can assume that blocks are entirely used by data

3

A 2

index341013141511-1-1…

Location ofIndex Block

Contents ofIndex Block

Page 4: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation (2)• Both direct and sequential access are very fast• Very easy to translate a logical file position into the

corresponding disk block• Position in index = logical position / block size• Use value in index to load the corresponding block into memory

4

A 2341013141511-1-1…

Location ofIndex Block

Contents ofIndex Block

index

Page 5: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation (3)• Index block can also store file metadata• Recall: many filesystems support hard

linking of a file from multiple paths• If metadata is stored in the directory

instead of with the file, metadata mustbe duplicated, could get out of sync, etc.• Indexed allocation can avoid this issue!

5

A 2341013141511-1-1…

Location ofIndex Block

Contents ofIndex Block

index

A C

C

D

A

home

B

user2user1

Page 6: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation (4)• Obvious overhead from indexed allocation is the index

• Tends to be greater overhead than e.g. linked allocation• Difficult to balance concerns for small and large files

• Don’t want small files to waste space with a mostly-empty index…• Don’t want large files to incur a lot of work from navigating many

small index blocks…• Index space tend to be allocated in units of storage blocks

6

A 2341013141511-1-1…

Location ofIndex Block

Contents ofIndex Block

index

Page 7: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation (5)• Option 1: a linked sequence of index blocks• Each index block has an array of file-block pointers• Last pointer in index block is either “end of index” value,

or a pointer to the next index block• Good for smaller files• Example: storage blocks of 512B; 32-bit index entries

• 512 bytes / 4 bytes = maximum of 128 entries• Index block might store 100 or more entries (extra space

for storing file metadata)• 100 entries per index block × 512 byte blocks = ~50KB file size for

a single index block• Usually want to use virtual page size as block size instead

• Max of 1024 entries per 4KiB page• If index entries refer to 4KiB blocks, a single index block can be

used for up to 4MB files before requiring a second index block

7

Page 8: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation (6)• Option 2: a multilevel index structure• An index page can reference other index pages, or it can

reference data blocks in the file itself (but not both)• Depth of indexing structure can be adjusted based on the

file’s size• As before, a single-level index can index up to ~4MB file

sizes• Above that size, a two-level index can be used:

• Leaf pages in index will each index up to ~4MB regions of the file• Each entry in the root of the index corresponds to ~4MB of the file• A two-level index can be used for up to a ~4GB file• A three-level index can be used for up to a ~4TB file• etc.

• Index can be navigated very efficiently for direct access

8

Page 9: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation (7)• Option 3: hybrid approach that blends other approaches• Example: UNIX Ext2 file system• Root index node (i-node) holds file metadata• Root index also holds pointers to the first 12 disk blocks

• Small files (e.g. up to ~50KB) only require a single index block• Called direct blocks

• If this is insufficient, one ofthe index pointers is usedfor single indirect blocks• One additional index block

is introduced to the structure,like linked organization

• Extends file size up to e.g.multiple-MB files

9

filemetadata

Page 10: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Layout: Indexed Allocation (8)• For even larger files, the next index pointer is used for

double indirect blocks• These blocks are accessed via a two-level index

hierarchy• Allows for very large files, up into multiple GB in size

• If this is insufficient, the last root-index pointer is used for triple indirect blocks

• These blocks use a three-level index hierarchy• Allows file sizes up into TB

• A size limit is imposed…• More recent extensions to

this filesystem format allowfor larger files (e.g. extents)

10

filemetadata

Page 11: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Files and Processes• The OS maintains a buffer of storage blocks in memory

• Storage devices are often much slower than the CPU; use caching to improve performance of reads and writes

• Multiple processes can open a file at the same time…

11

files[0]files[1]files[2]

…files[3]

offsetflags

v_ptr

files[0]files[1]files[2]

files[3]

offsetflags

v_ptr

files[4]files[5]

Process AKernel Data

Process BKernel Data

Global Kernel Data

filenamefile_ops

i_node

pathsizeflags

FileControlBlock

Storage Cache

Page 12: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Files and Processes (2)• Very common to have different processes perform reads

and writes on the same open file• OSes tend to vary in how they handle this circumstance,

but standard APIs can manage these interactions

12

files[0]files[1]files[2]

…files[3]

offsetflags

v_ptr

files[0]files[1]files[2]

files[3]

offsetflags

v_ptr

files[4]files[5]

Process AKernel Data

Process BKernel Data

Global Kernel Data

filenamefile_ops

i_node

pathsizeflags

FileControlBlock

Storage Cache

Page 13: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Files and Processes (3)• Multiple reads on the same file generally never block each

other, even for overlapping reads• Generally, a read that occurs after a write, should reflect

the completion of that write operation

• Writes should sometimes block each other, but OSes vary widely in how they handle this• e.g. Linux prevents multiple concurrent writes to the same file

• Most important situation to get correct is appending to file• Two operations must be performed: file is extended, then write is

performed into new space• If this task isn’t atomic, results will likely be completely broken files

13

Page 14: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Files and Processes (4)• OSes have several ways to govern concurrent file access• Often, entire files can be locked in shared or exclusive mode

• e.g. Windows CreateFile() API call allows a file to be locked in one of several modes when it’s created

• Other processes that attempt to perform conflicting operations are prevented from doing so by the operating system

• Some OSes provide advisory file-locking operations• Advisory locks aren’t enforced on actual file-IO operations• They are only enforced when processes participate in acquiring and

releasing these locks• Example: UNIX flock() acquires and releases advisory

locks on an entire file• Processes calling flock() can be blocked if a conflicting lock is held• If a process decides to just directly access the flock()’d file, the OS

won’t stop it!

14

Page 15: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Files and Processes (5)• Example: UNIX lockf() function can acquire and release

advisory locks on a region of a file• i.e. lock a section of the file in a shared or exclusive mode• Windows has a similar capability

• Both flock() and lockf() are wrappers to fcntl()• fcntl() can perform many different operations on files:

• Duplicate a file descriptor• Get and set control flags on open files• Enable or disable various kinds of I/O signals for open files• Acquire or release locks on files or ranges of files• etc.

• Some OSes also provide mandatory file-locking support• Processes are forced to abide by the current set of file locks• e.g. Linux has mandatory file-locking support, but this is non-standard

15

Page 16: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

File Deletion• File deletion is a generally straightforward operation

• Specific implementation details depend heavilyon the file system format

• General procedure:• Remove the directory entry referencing the file• If the file system contains no other hard-links

to the file, record that all of the file’s blocks arenow available for other files to use

• The file system must record what blocks are available for use when files are created or extended

• Often called a free-space list, although many different ways to record this information

• Some file systems already have a way of doing this, e.g. FAT formats simply mark clusters as unused in the table

16

A C

C

D

A

home

B

user2user1

Page 17: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Free Space Management• A simple approach: a bitmap with one bit per block

• If a block is free, the corresponding bit is 1• If a block is in use, the corresponding bit is 0

• Simple to find an available block, or a run of available blocks• Can make more efficient by accessing the bitmap in units of words,

skipping over entire words that are 0• This bitmap clearly occupies a certain amount of space

• e.g. a 4KiB block can record the state of 32768 blocks, or 128MiB of storage space

• A 1TB disk would require 8192 blocks (32MiB) to record the disk’s free-space bitmap

• The file system can break this bitmap into multiple parts• e.g. Ext2 manages a free-block bitmap for groups of blocks, with the

constraint that each group’s bitmap must always fit into one block

17

Page 18: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Free Space Management (2)• Another simple approach: a linked list of free blocks

• The file system records the first block in the free list• Each free block holds a pointer to the next block

• Also very simple to find an available block• Much harder to find a run of contiguous blocks that are available

• Tends to be more I/O costly than the bitmap approach• Requires additional disk accesses to scan and update the free-list

of blocks• Also, wastes a lot of space in the free list…

• A better use of free blocks: store the addresses of many free blocks in each block of the linked list• Only a subset of the free blocks are required for this information

• Still generally requires more space than bitmap approach

18

Page 19: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Free Space Management (3)• Many other ways of recording free storage space

• e.g. record runs of free contiguous blocks with (start, count) values• e.g. maintain more sophisticated maps of free space

• A common theme: many of these approaches don’t require actually touching the newly deallocated blocks• e.g. update a bitmap, store a block-pointer in another block, …

• Storage devices frequently still contain the old contents of deleted or truncated files• Called data remanence

• Sometimes this characteristic is useful for data recovery• e.g. file-undelete utilities• e.g. computer forensics when investigating crimes

• Also generally not difficult to securely erase devices

19

Page 20: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Free Space and SSDs• Solid State Drives (SSDs) and other flash-based devices

often complicate management of free space• SSDs are block devices; reads and writes are a fixed size• Problem: can only write to a block that is currently empty• Blocks can only be erased in groups, not individually!

• An erase block is a group of blocks that are erased together• Erase blocks are much larger than read/write blocks

• A read/write block might be 4KiB or 8KiB…• Erase blocks are often 128 or 256 of these blocks (e.g. 2MiB)!

• As long as some blocks on the SSD are empty, writes can be performed immediately

• If the SSD has no more empty blocks, a group of blocks must be erased to provide more empty blocks

20

Page 21: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Solid State Drives• Solid State Drives include a flash translation layer that

maps logical block addresses to physical memory cells• Recall: system uses Logical Block Addressing to access disks

• When files are written to the SSD, data must be stored in empty cells (i.e. old contents can’t simply be overwritten)

• If a file is edited, the SSD sees a writeissued against the same logical block• e.g. block 2 in file F1 is written

• SSD can’t just replace block’s contents…• SSD marks the cell as “old,” then stores

the new block data in another cell, andupdates the mapping in the FTL

21

F3.4

F1.1 F1.2 F1.3 F2.1

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

F1.2'

old

Page 22: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Solid State Drives (2)• Over time, SSD ends up with few or no available cells

• e.g. a series of writes to our SSD that results in all cells being used• SSD must erase at least one block of cells to be reused• Best case is when an entire erase-block can be reclaimed

• SSD erases the entire block, and then carries on as before

22

F3.4 F3.1' F3.4'

F1.1' F2.1' F1.3' F1.2''

F1.1 F1.2 F1.3 F2.1

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

F1.2'

old

old

old

old oldold

old

Erase!

F3.4 F3.1' F3.4'

F1.1' F2.1' F1.3' F1.2''

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

F1.2'

old

old old

F2.1''

old

Page 23: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Solid State Drives (3)• More complicated when an erase block still holds data

• e.g. SSD decides it must reclaim the third erase-block• SSD must relocate the current contents before erasing• Result: sometimes a write to the SSD incurs additional

writes within the SSD• Phenomenon is called write amplification

23

F3.4

F1.1' F2.1' F1.3' F1.2''

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

F1.2'

old

old

old

old oldold

old

F2.1''

old

F3.1' F3.4'Erase!

F1.1' F2.1' F1.3' F1.2''

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

old

old

old oldoldF2.1''

old

F3.1' F3.4'F3.1' F3.4'

Page 24: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

Solid State Drives (4)• SSDs must carefully manage this process to avoid

uneven wear of its memory cells• Cells can only survive so many erase cycles, then become useless

• How does the SSD know when a cell’s contents are no longer needed? (i.e. when to mark the cell “old”)

• The SSD only knows because it seesseveral writes to the same logical block• The new version replaces the old version, so

the old cell is no longer used for storage

24

F1.1' F2.1' F1.3' F1.2''

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

old

old

old oldoldF2.1''

old

F3.1' F3.4'

Page 25: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

SSDs and File Deletion• Problem: for most file system formats, file deletion

doesn’t actually touch the blocks in the file themselves!• File systems try to avoid this anyway, because storage I/O is slow!• Want to update the directory entry and the free-space map only,

and want this to be as efficient as possible• Example: File F3 is deleted from the SSD

• SSD will only see the block with the directoryentry change, and block(s) holding the free map

• The SSD has no idea that file F3’s datano longer needs to be preserved• e.g. if the SSD decides to erase bank 2, it will

still move F3.2 and F3.3 to other cells, eventhough the OS and the users don’t care!

25

F1.1' F2.1' F1.3' F1.2''

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

old

old

old oldoldF2.1''

old

F3.1' F3.4'

Page 26: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

SSDs, File Deletion and TRIM• To deal with this, SSDs introduced the TRIM command

• (TRIM is not an acronym)• When the filesystem is finished with certain logical blocks,

it can issue a TRIM command to inform the SSD that the data in those blocks can be discarded

• Previous example: file F3 is deleted• The OS can issue a TRIM command to inform

SSD that all associated blocks are now unused• TRIM allows the SSD to manage its cells

much more efficiently• Greatly reduces write magnification issues• Helps reduce wear on SSD memory cells

26

F1.1' F2.1' F1.3' F1.2''

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

old

old

old oldoldF2.1''

old

F3.1' F3.4'

old old

old

Page 27: FILE SYSTEMS, PART 2 - Caltech Computingcourses.cms.caltech.edu/cs124/lectures/CS124Lec25.pdf · FILE SYSTEMS, PART 2 CS124 –Operating Systems ... •The OS maintains a buffer of

SSDs, File Deletion and TRIM (2)• Still a few issues to resolve with TRIM at this point• Biggest one is TRIM wasn’t initially a queued command

• Couldn’t include TRIM commands in a mix of other read/write commands being sent to the device

• TRIM must be performed separately, in isolation of other operations• TRIM must be issued in a batch-mode

way, when it won’t interrupt other work• e.g. can’t issue TRIM commands immediately

after each delete operation• This was fixed in SATA 3.1 specification

• A queued version of TRIM was introduced• Another issue: not all OSes/filesystems

support TRIM (or not enabled by default)

27

F1.1' F2.1' F1.3' F1.2''

F2.2 F3.1 F3.2 F3.3

Flash Translation Layer

old

old

old oldoldF2.1''

old

F3.1' F3.4'

old old

old