1 unix internals – the new frontiers chapters 8 & 9 file systems

67
1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

Upload: abbie-tustin

Post on 14-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

1

UNIX Internals – The New FrontiersChapters 8 & 9

File Systems

Page 2: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

2

Contents

The User Interface to Files File System File System Framework The Vnode/VFS Architecture Implementation Overview File-System-Dependent Objects Mounting a File System Operations on Files The System V File System(s5fs) S5fs Kernel

Page 3: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

3

8.2 The User Interface files, directory, file descriptor, file systems File & Directories

File: logically a container for data A hierarchical, tree-structured name space Pathname: all the components in the path

from the root to the node, by “/” “.” & “..” Link: a directory entry for a file.

Page 4: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

4

Directory tree

Page 5: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

5

Operation on directory

dirp = opendir(const *filename); direntp = readdir (dirp); rewinddir(dirp); status = closedir(firp);

struct dirent { int_t d_ino;char d_name[NAME_MAX +1]; };

Page 6: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

6

File Attributes Kept in the inode: index node File attributes:

File type Number of hard links File size Device ID Inode number User and Group Ids of the owner of the file. Timestamps Permissions and mode flags

Page 7: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

7

Permissions and mode flags

0wner, group, others (3 x 3 bits) Read, write, execute (3 bits) Mode flags - apply to executable files

- suid, sgid – to set the user’s effective UID to that of the owner of the file, - stick – to retain file in swap area

Page 8: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

8

System calls

link, unlink – to create and delete hard links utimes – to change the access and modify

timestamps, chown – to change the owner UID and GID, Chmode – to change permissions and mode flags.

Page 9: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

9

File Descriptors fd = open (path, oflag, mode); fd is a per-process object.

Page 10: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

10

File descriptors

Page 11: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

11

File I/O Random and sequential access

lseek – random access nread = read(fd, buf, count); Write has similar semantics Operations are serialized In append mode offset pointer set to the

end of the file

Page 12: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

12

Scatter-Gather I/O nbytes = writev(fd, iov, iovcnt);

Page 13: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

13

File Locking Read and write are atomic. Advisory locks: protect from

cooperative processes, flock() in 4BSD; in SVR3 chmod must be enabled first

SVR4: r/w locks. Mandatory locks:kernel C library function lockf

Page 14: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

14

8.3 File systems

Mount-on - a directory is covered by the mounted file system. - mount table (original) & vfs list (modern)

Restrictions - file cannot span file system, - each file system must reside on a single logical disk

Page 15: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

15

Page 16: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

16

Logical Disks A logical disk is a storage abstraction that the kernel

sees as a linear sequence of fixed sized, randomly accessible blocks.

newfs, mkfs, Traditional: partition – physical storage of a file

system Modern configurations: Volume (several disks combined), Disk mirroring Stripe sets RAID(Redundant Array of Inexpensive Disks)

Page 17: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

17

Special files

Generalization to include all kinds of I/O related objects such as directories, symbolic links, hardware devices (disks, terminals, printers, psuedodevices such as the system memory, and communications abstractions such as pipes and sockets;

Problems with hard links – may not span file systems,can be created by superuser only, ownership problems,

Page 18: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

18

Special files

Symbolic links – special file that points to another file (linked-to file); the data portion of the file contains the pathname of the linked-to file; may be stored in the I-node of the symbolic link ( more on this in Practical UNIX Programming pp.90-96);

Pipes – created by pipe system call, deleted by the kernel automatically

FIFOs - created by mknod system call, must be explicitly deleted;

Page 19: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

19

8.5 File System Framework

Traditional UNIX can not support >1 types of FS.

The new developments (DOS, file sharing, RFS, NFS) require the framework to change. AT&T: file system switch Sun Microsystem: vnode/vfs DEC: gnode

SVR4:(AT&T+ vnode/vfs+NFS)-> de facto standard

Page 20: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

20

8.6 The Vnode/Vfs Architecture

Objectives Support several file system types

simultaneously. Different disk partitions may contain

different types of file systems. Support for sharing files over a network. Vendors should be able to create their

own file system types and add them to the kernel.

Page 21: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

21

Lessons from Device I/O Devices: block & character Character device switch:

struc cdevsw {

int (*d_open)();

int (*d_close)();

int (*d_read)();

int (*d_write)();

} cdevsw[ ];

Major device number: as the index

Page 22: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

22

read system call(in traditional UNIX)

1) Use the file descriptor to get to the open file object;2) Check the entry to see if the file is open for read;3) Get the pointer to the in-core inode from this entry;4) Lock the inode so as to serialize access to the file;5) Check the inode mode field and find that the file is a cha

racter device file.6) Use the major device number to index into a table of cha

racter devices and obtain the cdevsw entry for this device;

7) From the cdevsw, obtain the pointer to the d_read routine for this device;

8) Invoke the d_read operation to perform the device-specific processing of the read request.

9) Unlock the inode and return to the user.

Page 23: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

23

Lessons from Device I/O It is necessary to separate the file

subsystem code into file-system-independent code and file-system-dependent code

The interface between these two parts is defined by a set of generic functions that are called by the file system-independent code

Page 24: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

24

Object Oriented Design

Page 25: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

25

Overview of the Vnode/Vfs Interface

Vnode represents a file in the UNIX kernel.

Vfs represents a file system

Page 26: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

26

)

Page 27: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

27

base class data and operations pointers

v_data: inode(s5fs), rnode(NFS), tmpnode(tmpfs),

v_op: vnodeopsExample: to close the file associated with the vnode #define VOP_CLOSE(vp,…) (*((vp)->v_opclose))(vp,…)

Page 28: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

28

VFS base class

Page 29: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

29

8.7 Implementation Overview

Objectives Each operation must be carried out on behalf of the

current process. Certain operations may need to serialize access to the

file. The interface must be stateless and reentrant. FS implementation should be allowed to use global

resources, such as buffer cache. The interface should be usable by the server side The use of fixed-size static tables must be avoided.

Page 30: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

30

Vnodes and Open Files The vnode is the fundamental

abstraction that represents an active file in the kernel.

access to a vnode: by a file descriptor by file-system-dependent data structures

Page 31: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

31

Data structures

Reference count

Page 32: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

32

The Vnode

struct vnode{u_short v_flag; u_short v_count; struct vfs *vfsmountedhere; struct vnodeops *v_op; struct vfs *vfsp;…}; // p242

Page 33: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

33

Vnode Reference Count

It determines how long the vnode must remain in the kernel.

Reference versus lock: Acquire a reference:

Open a file A process holds a reference to its current directory. When a new file system is mounted Pathname traversal routine

file is deleted physically when reference count becomes zero.

Page 34: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

34

The Vfs Object struct vfs {

struct vfs *vfs_next; struct vfsops * vfs_op; struct vnode *vfs_vnodecovered; int vfs_fstype; caddr_t vfs_data; dev_t vfs_dev; …

}; //p243

Page 35: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

35

Page 36: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

36

8.8 File-System-Dependent Objects

The Per-File Private Data Vnode is an abstract objects.

Page 37: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

37

The vnodeops Vector struct vnodeops{

int (*vop_open)();

int (*vop_close)();

}; //p245

For ufs:

struct vnodeops ufs_vnodeops = {

ufs_open;

ufs_close;

}; //p246

Page 38: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

38

Page 39: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

39

File-System-Dependent Parts of the Vfs Layer

struct vfsops {

int (*vfs_mount)();

int (*vfs_unmount)();

int (*vfs_root)();

int (*vfs_statvfs)();

int (*vfs_sync)();

}; //p246

Page 40: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

40

Page 41: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

41

8.9 Mounting a File System

mount(spec, dir, flags, type, dataptr, datalen) //SVR4

Virtual File System Switch - a global table containing

one entry for each file system type. struct vfssw{

char *vsw_name;int (*vsw_init)();

struct vfsops * vsw_vfsops;

….

} vsfsw[];

Page 42: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

42

mount Implementation

Adds the structure to the linked list headed by rootvfs.

Sets the vfs_op field to the vfsops vector specified in the switch entry.

Sets the vfs_vnodecovered field to point to the vnode of the mount point directory.

Page 43: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

43

VFS_MOUNT processing

Verify permissions for the operation. Allocate and initialize the private data

object of the file system. Store a pointer to it in the vfs_data field

of the vfs object. Access the root directory of the file

system and initialize its vnode in memory.

Page 44: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

44

8.10 Operations on Files

Pathname Traversal lookuppn(): u_cdir

1. v_type is of a directory2. “..” & system root – move on3. “..” & a mounted system root – access the mount point4. VOP_LOOKUP5. Not found, last one - success, else – error ENOENT6. A mount point - go to the mounted vfs root7. A symbolic link – translate it and append8. Release the directory9. Go back to the top of the loop10. Terminate, do not release the reference of the final

vnode

//p250

Page 45: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

45

Opening a file

fd = open(pathname, mode)1. Allocate a descriptor2. Allocate an open file object3. Call lookuppn()4. Check the vnode for permissions5. Check for the operations6. Not exist, O_Creat, VOP_CREAT; ENOENT7. VOP_OPEN8. If O_TRUNC, VOP_SETATTR9. Initialize10. Return the index of the file descriptor

//p252

Page 46: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

46

Other topics

File I/O File attributes User credentials Analysis Drawbacks of the SVR4 Implementation The 4.4 BSD Model

Page 47: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

47

Chapter 9

File System Implementations

Page 48: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

48

9.2 The System V File System(s5fs)

The layout of s5fs partition:

Directories: s5fs directory is a

special file containing a list of files and subdirectories.

B S inode list data blocks

Page 49: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

49

Inodes The inode contains administrative

information,or meta data. The node list contains all the inodes. On-disk inode - see Tab. 9-1 In-core inode have more fields

Page 50: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

50

Inode Fields

Page 51: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

51

di_modeBit-fields

Page 52: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

52

Block array of inode—di_addr

inode

10, 10K

256, 256K

256*256=65K, 65M

256*256*256=16M, 16G

Page 53: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

53

The superblock Size in blocks of the file system Size in blocks of the inode list Number of free blocks and inodes Free block list Free inode list

Page 54: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

54

Free block list

Page 55: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

55

9.3 s5fs Kernel Organization

In-core Inodes The vnode Device ID Inode number of the file Flags for synchronization and cache management Pointers to keep the inode on a free list Pointers to keep the inode on a hash queue. Block number of last block read

Page 56: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

56

Allocating and Reclaiming Inodes

Inode table(LRU) containing the active inodes

Reference count of a vnode ==0 the reclaim the inode as free

Iget()(allocating):

Page 57: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

57

Inode lookup

s5lookup() Checks the directory name lookup cache Directory name lookup cache Miss? Reads

the directory one block at a time, searching the entries for the specified file name:Get it

If the file is in the directory, get the inode number, use iget() to locate the inode,

Inode in the table?get it: allocate a new inode, initialize, copy, put in the hash queue, also initialize the vnode(v_ops, v_data, vfs)

Return the pointer to the inode

Page 58: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

58

File I/O (1) Read(to a user buffer address)

Fd-> the open file object, verify mode-> vnode-> get the rw-lock->call s5read()

Offset -> block number & the offset -> uiomove()-> call copyout()

The page not in memory?page fault->the handler->s5getpage()->call bmap()

logical to physical mapping, search vnode’s page list, not in?allocates a free page and call the disk driver to read the data from disk

Sleeps until the I/O completes. Before copying to user data space, verifies the user has access

s5read() returns, unlock, advances the offset, returns the number of bytes read

Page 59: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

59

File I/O (2) Write:

Not immediately to disk May increase the file size May require the allocation of data blocks Read the entire block, write relevant data,

write back all the block

Page 60: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

60

Allocating and reclaiming Inodes

When the reference count drops to 0.. When a file becomes inactive…. It is better to reuse inodes…………

Page 61: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

61

Analysis of s5fs Reliability concern : super block Performance:

2 disk I/Os Blocks randomly located Block size: 512(SVR2), 1024(SVR3) Name: 14 characters Inodes limit: 65535

Page 62: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

62

The Berkeley Fast File System

Hard disk structure On-disk organization

- Blocks and fragments

- Allocation policy FFS functionality enhancements

– long file names, - symbolic links, - other enhancements;

Analysis

Page 63: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

63

Other file systems

Temporary file systems - RAM disk, mfs, tmpfs)

The Specfs File System The /proc File System

Page 64: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

64

Linux Virtual File System

Uniform file system interface to user processes

Represents any conceivable file system’s general feature and behavior

Assumes files are objects that share basic properties regardless of the target file system

Page 65: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

65

Page 66: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

66

Page 67: 1 UNIX Internals – The New Frontiers Chapters 8 & 9 File Systems

67

Primary Objects in VFS Superblock object

Represents a specific mounted file system Inode object

Represents a specific file Dentry object

Represents a specific directory entry File object

Represents an open file associated with a process