1 unix internals – the new frontiers chapters 8 & 9 file systems
Post on 14-Dec-2015
220 Views
Preview:
TRANSCRIPT
1
UNIX Internals – The New FrontiersChapters 8 & 9
File Systems
2
Contents
The User Interface to Files File System File System Framework The Vnode/VFS Architecture Implementation Overview File-System-Dependent Objects Mounting a File System Operations on Files The System V File System(s5fs) S5fs Kernel
3
8.2 The User Interface files, directory, file descriptor, file systems File & Directories
File: logically a container for data A hierarchical, tree-structured name space Pathname: all the components in the path
from the root to the node, by “/” “.” & “..” Link: a directory entry for a file.
4
Directory tree
5
Operation on directory
dirp = opendir(const *filename); direntp = readdir (dirp); rewinddir(dirp); status = closedir(firp);
struct dirent { int_t d_ino;char d_name[NAME_MAX +1]; };
6
File Attributes Kept in the inode: index node File attributes:
File type Number of hard links File size Device ID Inode number User and Group Ids of the owner of the file. Timestamps Permissions and mode flags
7
Permissions and mode flags
0wner, group, others (3 x 3 bits) Read, write, execute (3 bits) Mode flags - apply to executable files
- suid, sgid – to set the user’s effective UID to that of the owner of the file, - stick – to retain file in swap area
8
System calls
link, unlink – to create and delete hard links utimes – to change the access and modify
timestamps, chown – to change the owner UID and GID, Chmode – to change permissions and mode flags.
9
File Descriptors fd = open (path, oflag, mode); fd is a per-process object.
10
File descriptors
11
File I/O Random and sequential access
lseek – random access nread = read(fd, buf, count); Write has similar semantics Operations are serialized In append mode offset pointer set to the
end of the file
12
Scatter-Gather I/O nbytes = writev(fd, iov, iovcnt);
13
File Locking Read and write are atomic. Advisory locks: protect from
cooperative processes, flock() in 4BSD; in SVR3 chmod must be enabled first
SVR4: r/w locks. Mandatory locks:kernel C library function lockf
14
8.3 File systems
Mount-on - a directory is covered by the mounted file system. - mount table (original) & vfs list (modern)
Restrictions - file cannot span file system, - each file system must reside on a single logical disk
15
16
Logical Disks A logical disk is a storage abstraction that the kernel
sees as a linear sequence of fixed sized, randomly accessible blocks.
newfs, mkfs, Traditional: partition – physical storage of a file
system Modern configurations: Volume (several disks combined), Disk mirroring Stripe sets RAID(Redundant Array of Inexpensive Disks)
17
Special files
Generalization to include all kinds of I/O related objects such as directories, symbolic links, hardware devices (disks, terminals, printers, psuedodevices such as the system memory, and communications abstractions such as pipes and sockets;
Problems with hard links – may not span file systems,can be created by superuser only, ownership problems,
18
Special files
Symbolic links – special file that points to another file (linked-to file); the data portion of the file contains the pathname of the linked-to file; may be stored in the I-node of the symbolic link ( more on this in Practical UNIX Programming pp.90-96);
Pipes – created by pipe system call, deleted by the kernel automatically
FIFOs - created by mknod system call, must be explicitly deleted;
19
8.5 File System Framework
Traditional UNIX can not support >1 types of FS.
The new developments (DOS, file sharing, RFS, NFS) require the framework to change. AT&T: file system switch Sun Microsystem: vnode/vfs DEC: gnode
SVR4:(AT&T+ vnode/vfs+NFS)-> de facto standard
20
8.6 The Vnode/Vfs Architecture
Objectives Support several file system types
simultaneously. Different disk partitions may contain
different types of file systems. Support for sharing files over a network. Vendors should be able to create their
own file system types and add them to the kernel.
21
Lessons from Device I/O Devices: block & character Character device switch:
struc cdevsw {
int (*d_open)();
int (*d_close)();
int (*d_read)();
int (*d_write)();
} cdevsw[ ];
Major device number: as the index
22
read system call(in traditional UNIX)
1) Use the file descriptor to get to the open file object;2) Check the entry to see if the file is open for read;3) Get the pointer to the in-core inode from this entry;4) Lock the inode so as to serialize access to the file;5) Check the inode mode field and find that the file is a cha
racter device file.6) Use the major device number to index into a table of cha
racter devices and obtain the cdevsw entry for this device;
7) From the cdevsw, obtain the pointer to the d_read routine for this device;
8) Invoke the d_read operation to perform the device-specific processing of the read request.
9) Unlock the inode and return to the user.
23
Lessons from Device I/O It is necessary to separate the file
subsystem code into file-system-independent code and file-system-dependent code
The interface between these two parts is defined by a set of generic functions that are called by the file system-independent code
24
Object Oriented Design
25
Overview of the Vnode/Vfs Interface
Vnode represents a file in the UNIX kernel.
Vfs represents a file system
26
)
27
base class data and operations pointers
v_data: inode(s5fs), rnode(NFS), tmpnode(tmpfs),
v_op: vnodeopsExample: to close the file associated with the vnode #define VOP_CLOSE(vp,…) (*((vp)->v_opclose))(vp,…)
28
VFS base class
29
8.7 Implementation Overview
Objectives Each operation must be carried out on behalf of the
current process. Certain operations may need to serialize access to the
file. The interface must be stateless and reentrant. FS implementation should be allowed to use global
resources, such as buffer cache. The interface should be usable by the server side The use of fixed-size static tables must be avoided.
30
Vnodes and Open Files The vnode is the fundamental
abstraction that represents an active file in the kernel.
access to a vnode: by a file descriptor by file-system-dependent data structures
31
Data structures
Reference count
32
The Vnode
struct vnode{u_short v_flag; u_short v_count; struct vfs *vfsmountedhere; struct vnodeops *v_op; struct vfs *vfsp;…}; // p242
33
Vnode Reference Count
It determines how long the vnode must remain in the kernel.
Reference versus lock: Acquire a reference:
Open a file A process holds a reference to its current directory. When a new file system is mounted Pathname traversal routine
file is deleted physically when reference count becomes zero.
34
The Vfs Object struct vfs {
struct vfs *vfs_next; struct vfsops * vfs_op; struct vnode *vfs_vnodecovered; int vfs_fstype; caddr_t vfs_data; dev_t vfs_dev; …
}; //p243
35
36
8.8 File-System-Dependent Objects
The Per-File Private Data Vnode is an abstract objects.
37
The vnodeops Vector struct vnodeops{
int (*vop_open)();
int (*vop_close)();
…
}; //p245
For ufs:
struct vnodeops ufs_vnodeops = {
ufs_open;
ufs_close;
…
}; //p246
38
39
File-System-Dependent Parts of the Vfs Layer
struct vfsops {
int (*vfs_mount)();
int (*vfs_unmount)();
int (*vfs_root)();
int (*vfs_statvfs)();
int (*vfs_sync)();
…
}; //p246
40
41
8.9 Mounting a File System
mount(spec, dir, flags, type, dataptr, datalen) //SVR4
Virtual File System Switch - a global table containing
one entry for each file system type. struct vfssw{
char *vsw_name;int (*vsw_init)();
struct vfsops * vsw_vfsops;
….
} vsfsw[];
42
mount Implementation
Adds the structure to the linked list headed by rootvfs.
Sets the vfs_op field to the vfsops vector specified in the switch entry.
Sets the vfs_vnodecovered field to point to the vnode of the mount point directory.
43
VFS_MOUNT processing
Verify permissions for the operation. Allocate and initialize the private data
object of the file system. Store a pointer to it in the vfs_data field
of the vfs object. Access the root directory of the file
system and initialize its vnode in memory.
44
8.10 Operations on Files
Pathname Traversal lookuppn(): u_cdir
1. v_type is of a directory2. “..” & system root – move on3. “..” & a mounted system root – access the mount point4. VOP_LOOKUP5. Not found, last one - success, else – error ENOENT6. A mount point - go to the mounted vfs root7. A symbolic link – translate it and append8. Release the directory9. Go back to the top of the loop10. Terminate, do not release the reference of the final
vnode
//p250
45
Opening a file
fd = open(pathname, mode)1. Allocate a descriptor2. Allocate an open file object3. Call lookuppn()4. Check the vnode for permissions5. Check for the operations6. Not exist, O_Creat, VOP_CREAT; ENOENT7. VOP_OPEN8. If O_TRUNC, VOP_SETATTR9. Initialize10. Return the index of the file descriptor
//p252
46
Other topics
File I/O File attributes User credentials Analysis Drawbacks of the SVR4 Implementation The 4.4 BSD Model
47
Chapter 9
File System Implementations
48
9.2 The System V File System(s5fs)
The layout of s5fs partition:
Directories: s5fs directory is a
special file containing a list of files and subdirectories.
B S inode list data blocks
49
Inodes The inode contains administrative
information,or meta data. The node list contains all the inodes. On-disk inode - see Tab. 9-1 In-core inode have more fields
50
Inode Fields
51
di_modeBit-fields
52
Block array of inode—di_addr
inode
10, 10K
256, 256K
256*256=65K, 65M
256*256*256=16M, 16G
53
The superblock Size in blocks of the file system Size in blocks of the inode list Number of free blocks and inodes Free block list Free inode list
54
Free block list
55
9.3 s5fs Kernel Organization
In-core Inodes The vnode Device ID Inode number of the file Flags for synchronization and cache management Pointers to keep the inode on a free list Pointers to keep the inode on a hash queue. Block number of last block read
56
Allocating and Reclaiming Inodes
Inode table(LRU) containing the active inodes
Reference count of a vnode ==0 the reclaim the inode as free
Iget()(allocating):
57
Inode lookup
s5lookup() Checks the directory name lookup cache Directory name lookup cache Miss? Reads
the directory one block at a time, searching the entries for the specified file name:Get it
If the file is in the directory, get the inode number, use iget() to locate the inode,
Inode in the table?get it: allocate a new inode, initialize, copy, put in the hash queue, also initialize the vnode(v_ops, v_data, vfs)
Return the pointer to the inode
58
File I/O (1) Read(to a user buffer address)
Fd-> the open file object, verify mode-> vnode-> get the rw-lock->call s5read()
Offset -> block number & the offset -> uiomove()-> call copyout()
The page not in memory?page fault->the handler->s5getpage()->call bmap()
logical to physical mapping, search vnode’s page list, not in?allocates a free page and call the disk driver to read the data from disk
Sleeps until the I/O completes. Before copying to user data space, verifies the user has access
s5read() returns, unlock, advances the offset, returns the number of bytes read
59
File I/O (2) Write:
Not immediately to disk May increase the file size May require the allocation of data blocks Read the entire block, write relevant data,
write back all the block
60
Allocating and reclaiming Inodes
When the reference count drops to 0.. When a file becomes inactive…. It is better to reuse inodes…………
61
Analysis of s5fs Reliability concern : super block Performance:
2 disk I/Os Blocks randomly located Block size: 512(SVR2), 1024(SVR3) Name: 14 characters Inodes limit: 65535
62
The Berkeley Fast File System
Hard disk structure On-disk organization
- Blocks and fragments
- Allocation policy FFS functionality enhancements
– long file names, - symbolic links, - other enhancements;
Analysis
63
Other file systems
Temporary file systems - RAM disk, mfs, tmpfs)
The Specfs File System The /proc File System
64
Linux Virtual File System
Uniform file system interface to user processes
Represents any conceivable file system’s general feature and behavior
Assumes files are objects that share basic properties regardless of the target file system
65
66
67
Primary Objects in VFS Superblock object
Represents a specific mounted file system Inode object
Represents a specific file Dentry object
Represents a specific directory entry File object
Represents an open file associated with a process
top related