file system - web.cs.ucdavis.edupandey/teaching/ecs150/lects/07fs.pdf · spring 2011. ecs 150...

70
File System Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011

Upload: others

Post on 22-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

File System

Raju PandeyDepartment of Computer Sciences

University of California, DavisSpring 2011

ECS 150 (Operating Systems) File System (1), 2 Spring 2011 UC Davis

Hardware

ApplicationProgram

File

Mgr

Dev

ice

Mgr

Mem

ory

Mgr

Proc

ess M

gr

UNIX

File

Mgr

Dev

ice

Mgr

Mem

ory

Mgr

Proc

ess M

gr

Windows

open()read()

close()

write()

lseek()

CreateFile()ReadFile()CloseHandle()

SetFilePointer()

WriteFile()mount()

Overall view

Files

• A file is a collection of data with some properties

contents, size, owner, last read/write time, protection …

• Files may also have types

understood by file system

o device, directory, symbolic link

understood by other parts of OS or by runtime libraries

o executable, dll, source code, object code, text file, …

• Type can be encoded in the file’s name or contents

windows encodes type in name

o .com, .exe, .bat, .dll, .jpg, .mov, .mp3, …

old Mac OS stored the name of the creating program along with the file

unix has a smattering of both

o in content via magic numbers or initial characters (e.g., #!)

ECS 150 (Operating Systems) File System (1), 3 Source: Gribble, Lazowska, Levy, Zahorjan

File Management

• The file manager administers the collection by:

Storing the information on a device

Mapping the block storage to a logical view

Allocating/deallocating storage

Providing file directories

• What abstraction should be presented to programmer?

ECS 150 (Operating Systems) File System (1), 4 Spring 2011 UC Davis

Records

Applications

Structured Record Files

Record-Stream Translation

Stream-Block Translation

Byte Stream Files

Storage device

OS DiskStd.

RuntimeLibrary

App.Code

Device-typeDependent Commands

Syscalls

ProcedureCalls

Whatever…

Interface Layers

ECS 150 (Operating Systems) File System (1), 5 Source: Gribble, Lazowska, Levy, Zahorjan

OS DiskStd.

RuntimeLibrary

App.Code

Array of BlocksDirectories,Directory Entries,

Files,…Whatever

/

etcroot

OS +a tiny bit of

file type / structure

/

etcroot.xls

Exported Abstractions

ECS 150 (Operating Systems) File System (1), 6 Source: Gribble, Lazowska, Levy, Zahorjan

OS Disk

/

etcroot

1. Hide hardware specific interface

2. Allocate disk blocks3. Check permissions4. Understand directory

file structure5. Maintain metadata6. Performance7. Flexibility

Why does the OS define directories? Why not leave that to the library/application layer?

(Why would you want to leave it to the app/library?)

Primary Roles of the OS (file system)

ECS 150 (Operating Systems) File System (1), 7 Source: Gribble, Lazowska, Levy, Zahorjan

File access methods

• Some file systems provide different access methods that specify ways the application will access data

sequential accesso read bytes one at a time, in order

direct accesso random access given a block/byte #

record accesso file is array of fixed- or variable-sized records

indexed accesso FS contains an index to a particular field of each record in a fileo apps can find a file based on value in that record (similar to DB)

• Why do we care about distinguishing sequential from direct access?

what might the FS do differently in these cases?

ECS 150 (Operating Systems) File System (1), 8 Source: Gribble, Lazowska, Levy, Zahorjan

Byte Stream File Interface

fileID = open(fileName)close(fileID)read(fileID, buffer, length)write(fileID, buffer, length)seek(fileID, filePosition)

ECS 150 (Operating Systems) File System (1), 9 Spring 2011 UC Davis

Low Level Files

Stream-Block Translation

b0 b1 b2 bi......

fid = open(“fileName”,…);…read(fid, buf, buflen);…close(fid);

int open(…) {…}int close(…) {…}int read(…) {…}int write(…) {…}int seek(…) {…}

Storage device response to commands

ECS 150 (Operating Systems) File System (1), 10 Spring 2011 UC Davis

Structured Files

Records

Record-Block Translation

ECS 150 (Operating Systems) File System (1), 11 Spring 2011 UC Davis

Record-Oriented Sequential Files

Logical Record

fileID = open(fileName)close(fileID)getRecord(fileID, record)putRecord(fileID, record)seek(fileID, position)

ECS 150 (Operating Systems) File System (1), 12 Spring 2011 UC Davis

Example: Mail system

Record-Oriented Sequential Files

...H byte header k byte logical record

Logical Record

ECS 150 (Operating Systems) File System (1), 13 Spring 2011 UC Davis

Indexed Sequential File

• Suppose we want to directly access records, rather than sequentially

• Add an index to the file• Each record includes an index field

fileID = open(fileName)close(fileID)getRecord(fileID, index)index = putRecord(fileID, record)deleteRecord(fileID, index)

ECS 150 (Operating Systems) File System (1), 14 Spring 2011 UC Davis

Indexed Sequential File (cont)

Account #012345123456294376...529366...965987

Index

ik

j

index = i

index = k

index = j

Application structure

ECS 150 (Operating Systems) File System (1), 15 Spring 2011 UC Davis

Directories

• Directories provide:

a way for users to organize their files

a convenient file name space for both users and FS’s

• Most file systems support multi-level directories

naming hierarchies (/, /usr, /usr/local, /usr/local/bin, …)

• Most file systems support the notion of current directory

absolute names: fully-qualified starting from root of FSbash$ cd /usr/local

relative names: specified with respect to current directorybash$ cd /usr/local (absolute)

bash$ cd bin (relative, equivalent to cd /usr/local/bin)

ECS 150 (Operating Systems) File System (1), 16 Source: Gribble, Lazowska, Levy, Zahorjan

Single-Level Directory

• A single directory for all users

Naming problem

Grouping problem

ECS 150 (Operating Systems) File System (1), 17 Source: Stallings

Two-Level Directory

• Separate directory for each user

Path name

Can have the same file name for different user

Efficient searching

No grouping capability

ECS 150 (Operating Systems) File System (1), 18 Source: Stallings

Tree-Structured Directories

ECS 150 (Operating Systems) File System (1), 19 Source: Stallings

Tree-Structured Directories (Cont)

• Efficient searching

• Grouping Capability

• Current directory (working directory)cd /spell/mail/progtype list

ECS 150 (Operating Systems) File System (1), 20 Source: Stallings

Tree-Structured Directories (Cont)

• Absolute or relative path name• Creating a new file is done in current directory• Delete a file

rm <file-name>• Creating a new subdirectory is done in current

directorymkdir <dir-name>

Example: if in current directory /mailmkdir count

mail

prog copy prt exp count

Deleting “mail” ⇒ deleting the entire subtree rooted by “mail”

ECS 150 (Operating Systems) File System (1), 21 Source: Stallings

Acyclic-Graph Directories

• Have shared subdirectories and files

ECS 150 (Operating Systems) File System (1), 22 Source: Stallings

Acyclic-Graph Directories (Cont.)

• Two different names (aliasing)

• If dict deletes list ⇒ dangling pointerSolutions:

Backpointers, so we can delete all pointersVariable size records a problemBackpointers using a daisy chain organizationEntry-hold-count solution

• New directory entry typeLink – another name (pointer) to an existing fileResolve the link – follow pointer to locate the file

ECS 150 (Operating Systems) File System (1), 23 Source: Stallings

File System Mounting

• A file system must be mounted before it can be accessed

• A file system is mounted at a mount point (a directory)

ECS 150 (Operating Systems) File System (1), 24 Source: Stallings

(a) Existing. (b) Unmounted Partition

ECS 150 (Operating Systems) File System (1), 25 Source: Stallings

Mount Point

ECS 150 (Operating Systems) File System (1), 26 Source: Stallings

Directory internals

• A directory is typically just a file that happens to contain special metadata

directory = list of (name of file, file attributes)attributes include such things as:o size, protection, location on disk, creation time, access time, …

the directory list is usually unordered (effectively random)o when you type “ls”, the “ls” command sorts the results for you

ECS 150 (Operating Systems) File System (1), 27 Source: Gribble, Lazowska, Levy, Zahorjan

Path name translation

• Let’s say you want to open “/one/two/three”fd = open(“/one/two/three”, O_RDWR);

• What goes on inside the file system?open directory “/” (well known, can always find)search the directory for “one”, get location of “one”open directory “one”, search for “two”, get location of “two”open directory “two”, search for “three”, get loc. of “three”open file “three”(of course, permissions are checked at each step)

• FS spends lots of time walking down directory pathsthis is why open is separate from read/write (session state)OS will cache prefix lookups to enhance performanceo /a/b, /a/bb, /a/bbb all share the “/a” prefix

ECS 150 (Operating Systems) File System (1), 28 Source: Gribble, Lazowska, Levy, Zahorjan

File protection

• FS must implement some kind of protection systemto control who can access a file (user)to control how they can access it (e.g., read, write, or exec)

• More generally:generalize files to objects (the “what”)generalize users to principals (the “who”, user or program)generalize read/write to actions (the “how”, or operations)

• A protection system dictates whether a given action performed by a given principal on a given object should be allowed

e.g., you can read or write your files, but others cannote.g., your can read /etc/motd but you cannot write to it

ECS 150 (Operating Systems) File System (1), 29 Source: Gribble, Lazowska, Levy, Zahorjan

Model for representing protection

• Two different ways of thinking about it:access control lists (ACLs)o for each object, keep list of principals and principals’ allowed

actionscapabilitieso for each principal, keep list of objects and principal’s allowed

actions

• Both can be represented with the following matrix:

/etc/passwd /home/gribble /home/guest

root rw rw rw

gribble r rw r

guest rprincipals

objects

ACL

capability

ECS 150 (Operating Systems) File System (1), 30 Source: Gribble, Lazowska, Levy, Zahorjan

ACLs vs. Capabilities

• Capabilities are easy to transferthey are like keys: can hand them offthey make sharing easy

• ACLs are easier to manageobject-centric, easy to grant and revokeo to revoke capability, need to keep track of principals that have ito hard to do, given that principals can hand off capabilities

• ACLs grow large when object is heavily sharedcan simplify by using “groups”o put users in groups, put groups in ACLso you are all in the “VMware powerusers” group on Win2K

additional benefito change group membership, affects ALL objects that have this group

in its ACL

ECS 150 (Operating Systems) File System (1), 31 Source: Gribble, Lazowska, Levy, Zahorjan

Implementing Low Level Files

• Secondary storage device contains:Volume directory (sometimes a root directory for a file system)External file descriptor for each fileThe file contents

• Manages blocksAssigns blocks to files (descriptor keeps track)Keeps track of available blocks

• Maps to/from byte stream

ECS 150 (Operating Systems) File System (1), 32 Spring 2011 UC Davis

Disk Organization

Blk0 Blk1 Blkk-1

Blkk Blkk+1 Blk2k-1

Track 0, Cylinder 0

Track 0, Cylinder 1

Blk Blk Blk Track 1, Cylinder 0

Blk Blk Blk Track N-1, Cylinder 0

Blk Blk Blk Track N-1, Cylinder M-1

Boot Sector Volume Directory

ECS 150 (Operating Systems) File System (1), 33 Spring 2011 UC Davis

Low-level File System Architecture

b0 b1 b2 b3 bn-1… …

Block 0

...

Sequential Device Randomly Accessed Device

ECS 150 (Operating Systems) File System (1), 34 Spring 2011 UC Davis

File Descriptors

•External name•Current state•Sharable•Owner•User•Locks•Protection settings•Length•Time of creation•Time of last modification•Time of last access•Reference count•Storage device details

ECS 150 (Operating Systems) File System (1), 35 Spring 2011 UC Davis

An open() Operation

• Locate the on-device (external) file descriptor• Extract info needed to read/write file• Authenticate that process can access the file • Create an internal file descriptor in primary memory• Create an entry in a “per process” open file status table• Allocate resources, e.g., buffers, to support file usage

ECS 150 (Operating Systems) File System (1), 36 Spring 2011 UC Davis

File Manager Data Structures

External File Descriptor

Open FileDescriptor

Copy info from external to the open file descriptor

1

Process-FileSession

Keep the state of the process-file session

2

Return a reference to the data structure

3

ECS 150 (Operating Systems) File System (1), 37 Spring 2011 UC Davis

Opening a UNIX File

fid = open(“fileA”, flags);…read(fid, buffer, len);

0 stdin1 stdout2 stderr3 ...

Open File Table

File structure

inode

Internal File Descriptor

On-Device File Descriptor

ECS 150 (Operating Systems) File System (1), 38 Spring 2011 UC Davis

Block Management

• The job of selecting & assigning storage blocks to the file

• For a fixed sized file of k blocksFile of length m requires N = ⎡m/k⎤ blocksByte bi is stored in block ⎣i/k⎦

• Three basic strategies:Contiguous allocationLinked listsIndexed allocation

ECS 150 (Operating Systems) File System (1), 39 Spring 2011 UC Davis

Contiguous Allocation

• Maps the N blocks into N contiguous blocks on the secondary storage device

• Difficult to support dynamic file sizes

Head position 237…First block 785Number of blocks 25

File descriptor

ECS 150 (Operating Systems) File System (1), 40 Spring 2011 UC Davis

ECS 150 (Operating Systems) File System (1), 41 Source: Stallings

ECS 150 (Operating Systems) File System (1), 42 Source: Stallings

After Compaction

Linked Lists or Chained Allocation

• Each block contains a header withNumber of bytes in the blockPointer to next block

• Blocks need not be contiguous• Files can expand and contract• Seeks can be slow

First block…

Head: 417...

LengthByte 0

Byte 4095...

LengthByte 0

Byte 4095...

LengthByte 0

Byte 4095...

Block 0 Block 1 Block N-1

ECS 150 (Operating Systems) File System (1), 43 Spring 2011 UC Davis

ECS 150 (Operating Systems) File System (1), 44 Source: Stallings

After Consolidation

ECS 150 (Operating Systems) File System (1), 45 Source: Stallings

Indexed Allocation

• Extract headers and put them in an index• Simplify seeks• May link indices together (for large files)

Index block…

Head: 417...

Byte 0

Byte 4095...

Byte 0

Byte 4095...

Byte 0

Byte 4095...

Block 0

Block 1

Block N-1

Length

Length

Length

ECS 150 (Operating Systems) File System (1), 46 Spring 2011 UC Davis

Indexed allocation with Block portions

ECS 150 (Operating Systems) File System (1), 47 Source: Stallings

Indexed allocation with variable length portions

ECS 150 (Operating Systems) File System (1), 48 Source: Stallings

The original Unix file system

• Dennis Ritchie and Ken Thompson, Bell Labs, 1969• “UNIX rose from the ashes of a multi-organizational effort in

the early 1960s to develop a dependable timesharing operating system” – Multics

• Designed for a “workgroup” sharing a single system• Did its job exceedingly well

Although it has been stretched in many directions and made ugly in the process

• A wonderful study in engineering tradeoffs

ECS 150 (Operating Systems) File System (1), 49 Source: Gribble, Lazowska, Levy, Zahorjan

(Old) Unix disks are divided into five parts …

• Boot blockcan boot the system by loading from this block

• Superblockspecifies boundaries of next 3 areas, and contains head of freelists of inodes and file blocks

• i-node areacontains descriptors (i-nodes) for each file on the disk; all i-nodes are the same size; head of freelist is in the superblock

• File contents areafixed-size blocks; head of freelist is in the superblock

• Swap areaholds processes that have been swapped out of memory

ECS 150 (Operating Systems) File System (1), 50 Source: Gribble, Lazowska, Levy, Zahorjan

So …

• You can attach a disk to a dead system …• Boot it up …• Find, create, and modify files …

because the superblock is at a fixed place, and it tells you where the i-node area and file contents area areby convention, the second i-node is the root directory of the volume

ECS 150 (Operating Systems) File System (1), 51 Source: Gribble, Lazowska, Levy, Zahorjan

i-node format

• User number• Group number• Protection bits• Times (file last read, file last written, inode last written)• File code: specifies if the i-node represents a directory, an

ordinary user file, or a “special file” (typically an I/O device)

• Size: length of file in bytes• Block list: locates contents of file (in the file contents

area)more on this soon!

• Link count: number of directories referencing this i-node

ECS 150 (Operating Systems) File System (1), 52 Source: Gribble, Lazowska, Levy, Zahorjan

The flat (i-node) file system

• Each file is known by a number, which is the number of the i-node

seriously – 1, 2, 3, etc.!why is it called “flat”?

• Files are created empty, and grow when extended through writes

ECS 150 (Operating Systems) File System (1), 53 Source: Gribble, Lazowska, Levy, Zahorjan

The tree (directory, hierarchical) file system

• A directory is a flat file of fixed-size entries• Each entry consists of an i-node number and a file name

i-node number File name152 .18 ..

216 my_file4 another_file93 oh_my_god

144 a_directory

• It’s as simple as that!ECS 150 (Operating Systems) File System (1), 54 Source: Gribble, Lazowska,

Levy, Zahorjan

The “block list” portion of the i-node (Unix Version 7)

• Points to blocks in the file contents area• Must be able to represent very small and very large files. • Each inode contains 13 block pointers

first 10 are “direct pointers” (pointers to 512B blocks of file data)then, single, double, and triple indirect pointers

01

101112

… …

ECS 150 (Operating Systems) File System (1), 55 Source: Gribble, Lazowska, Levy, Zahorjan

So …

• Only occupies 13 x 4B in the i-node• Can get to 10 x 512B = a 5120B file directly

(10 direct pointers, blocks in the file contents area are 512B)

• Can get to 128 x 512B = an additional 65KB with a single indirect reference

(the 11th pointer in the i-node gets you to a 512B block in the file contents area that contains 128 4B pointers to blocks holding file data)

• Can get to 128 x 128 x 512B = an additional 8MB with a double indirect reference

(the 12th pointer in the i-node gets you to a 512B block in the file contents area that contains 128 4B pointers to 512B blocks in the file contents area that contain 128 4B pointers to 512B blocks holding file data)

ECS 150 (Operating Systems) File System (1), 56 Source: Gribble, Lazowska, Levy, Zahorjan

• Can get to 128 x 128 x 128 x 512B = an additional 1GB with a triple indirect reference

(the 13th pointer in the i-node gets you to a 512B block in the file contents area that contains 128 4B pointers to 512B blocks in the file contents area that contain 128 4B pointers to 512B blocks in the file contents area that contain 128 4B pointers to 512B blocks holding file data)

• Maximum file size is 1GB + a smidge• Berkeley Unix went to 1KB block sizes

What’s the effect on the maximum file size?o 256x256x256x1K = 17 GB + a smidge

What’s the price?

• Subsequently went to 4KB blocks1Kx1Kx1Kx4K = 4TB + a smidge

ECS 150 (Operating Systems) File System (1), 57 Source: Gribble, Lazowska, Levy, Zahorjan

Putting it all together

• The file system is just a huge data structure

superblock

inodefree list

file blockfree list

inode for ‘/’directory ‘/’(table of entries)

inode for‘usr/’

inode for‘var/’

directory ‘var/’(table of entries)

directory ‘usr/’(table of entries)

••••••inode for

‘bigfile.bin’

data blocks

indirection block

data blocks

Indirectionblock

indirectionblock

•••

•••

ECS 150 (Operating Systems) File System (1), 58 Source: Gribble, Lazowska, Levy, Zahorjan

59

The UNIX File System

The steps in looking up /usr/ast/mbox

File system layout

• One important goal of a file system is to lay this data structure out on disk

have to keep in mind the physical characteristics of the disk itself (seeks are expensive)and the characteristics of the workload (locality across files within a directory, sequential access to many files)

• Old UNIX’s layout is very inefficientconstantly seeking back and forth between inode area and data block area as you traverse the file system, or even as you sequentially read files

• Newer file systems are smarter• Newer storage devices (SSDs) change the constraints, but

not the basic data structure

ECS 150 (Operating Systems) File System (1), 60 Source: Gribble, Lazowska, Levy, Zahorjan

File system consistency

• Both i-nodes and file blocks are cached in memory• The “sync” command forces memory-resident disk

information to be written to disksystem does a sync every few seconds

• A crash or power failure between sync’s can leave an inconsistent disk

• You could reduce the frequency of problems by reducing caching, but performance would suffer big-time

ECS 150 (Operating Systems) File System (1), 61 Source: Gribble, Lazowska, Levy, Zahorjan

What do you do after a crash?

• Run a program called “fsck” to try to fix any consistency problems

• fsck has to scan the entire diskas disks are getting bigger, fsck is taking longer and longermodern disks: fsck can take a full day!

• Newer file systems try to help hereare more clever about the order in which writes happen, and where writes are directedo e.g., Journaling file system: collect recent writes in a log called a

journal. On crash, run through journal to replay against file system.

ECS 150 (Operating Systems) File System (1), 62 Source: Gribble, Lazowska, Levy, Zahorjan

VFS-based File Manager

File System IndependentPart of File Manager

Exports OS-specific API

Virtual File System Switch

MS-DOS Part ofFile Manager

ISO 9660 Part ofFile Manager

ext2 Part ofFile Manager

ECS 150 (Operating Systems) File System (1), 63 Spring 2011 UC Davis

Linux VFS

• Multiple interfaces build up VFS:

filesdentries inodessuperblock quota

• VFS can do all caching & provides utility fctns to FS

• FS provides methods to VFS; many are optional

File access

ECS 150 (Operating Systems) File System (1), 64 Source: P.J.Braam, CMU

User level file access

• Typical user level types and code:pathnames: “/myfile” file descriptors: fd = open(“/myfile”…)attributes in struct stat: stat(“/myfile”, &mybuf), chmod, chown...offsets: write, read, lseekdirectory handles: DIR *dh = opendir(“/mydir”)directory entries: struct dirent *ent = readdir(dh)

ECS 150 (Operating Systems) File System (1), 65 Source: P.J.Braam, CMU

VFS

• Manages kernel level file abstractions in one format for all file systems

• Receives system call requests from user level (e.g. write, open, stat, link)

• Interacts with a specific file system based on mount point traversal

• Receives requests from other parts of the kernel, mostly from memory management

ECS 150 (Operating Systems) File System (1), 66 Source: P.J.Braam, CMU

File system level

• Individual File Systemsresponsible for managing file & directory dataresponsible for managing meta-data: timestamps, owners, protection etctranslates data betweeno particular FS data: e.g. disk data, NFS data, Coda/AFS datao VFS data: attributes etc in standard format

e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so.

ECS 150 (Operating Systems) File System (1), 67 Source: P.J.Braam, CMU

Anatomy of stat system callsys_stat(path, buf) {

dentry = namei(path);if ( dentry == NULL ) return -

ENOENT;

inode = dentry->d_inode;rc =inode->i_op-

>i_permission(inode);if ( rc ) return -EPERM;

rc = inode->i_op->i_getattr(inode, buf);

dput(dentry);return rc;

}

Establish VFS data

Call into inode layer of filesystem

Call into inode layer of filesystem

ECS 150 (Operating Systems) File System (1), 68 Source: P.J.Braam, CMU

Anatomy of fstatfs system call

sys_fstatfs(fd, buf) { /* for things like “df” */

file = fget(fd);if ( file == NULL ) return -EBADF;

superb = file->f_dentry->d_inode->i_super;

rc = superb->sb_op->sb_statfs(sb, buf);return rc;

}

Call into superblock layer of filesystem

Translate fd to VFS data structure

ECS 150 (Operating Systems) File System (1), 69 Source: P.J.Braam, CMU

DOS FAT Files

DiskBlock

File Descriptor

DiskBlock

DiskBlock

…43

107254

File Access Table (FAT)

DiskBlock

DiskBlock

DiskBlock

…43

107

10743

254

254

File Descriptor

ECS 150 (Operating Systems) File System (1), 70 Spring 2011 UC Davis