file systeminterface-pre-final-formatting

7
Outline FILE CONCEPT o File Attributes (Name, Identifier, Type, Location, Size, Protection, Time & Date, User ID) o File Operations o File Types o File Structure o Internal File Structure ACCESS METHODS o Sequential Access o Direct Access o Other Access Methods DIRECTORY STRUCTURE o Storage Structure o Directory Overview o Single-Level Directory o Two-Level Directory o Tree-Structured Directories o Acyclic-Graph Directories o General Graph Directory FILE-SYSTEM MOUNTING FILE SHARING o Multiple Users o Remote File Systems (The Client-Server Model, Distributed Information Systems, Failure Modes) o Consistency Semantics (UNIX Semantics, Session Semantics, Immutable-Shared-Files Semantics) PROTECTION o Types of Access o Access Control o Other Protection Approaches and Issues Contents FILE CONCEPT File Attributes 1 CPU-Scheduling (Galvin)

Upload: marangburu42

Post on 18-Jan-2017

26 views

Category:

Art & Photos


0 download

TRANSCRIPT

Page 1: File systeminterface-pre-final-formatting

Outline FILE CONCEPT

o File Attributes (Name, Identifier, Type, Location, Size, Protection, Time & Date, User ID)o File Operationso File Typeso File Structureo Internal File Structure

ACCESS METHODSo Sequential Accesso Direct Accesso Other Access Methods

DIRECTORY STRUCTUREo Storage Structureo Directory Overviewo Single-Level Directoryo Two-Level Directoryo Tree-Structured Directorieso Acyclic-Graph Directorieso General Graph Directory

FILE-SYSTEM MOUNTING

FILE SHARINGo Multiple Userso Remote File Systems (The Client-Server Model, Distributed Information Systems, Failure Modes)o Consistency Semantics (UNIX Semantics, Session Semantics, Immutable-Shared-Files Semantics)

PROTECTIONo Types of Accesso Access Controlo Other Protection Approaches and Issues

ContentsFILE CONCEPT

File Attributes

Different OSes keep track of different file attributes, including Name, Identifier (e.g. inode number), Type (Text, executable, other binary, etc.), Location (E.g., Hard drive), Size, Protection, Time & Date, User ID. Some systems give special significance to names, and particularly extensions (.exe, .txt, etc.), and some do not. Some extensions may be of significance to the OS (.exe), and others only to certain applications (.jpg).

File Operations

The file ADT supports many common operations: Creating a file, Writing a file, Reading a file, Repositioning within a file, Deleting a file, Truncating a file.

1CPU-Scheduling (Galvin)

Page 2: File systeminterface-pre-final-formatting

Information about currently open files is stored in an open file table, containing for example:o File pointer - records the current position in the file, for the next read or write access.o File-open count - How many times has the current file been opened (simultaneously by different processes) and not yet closed?

When this counter reaches zero the file can be removed from the table. o Disk location of the file. o Access rights

Some systems provide support for file locking. o A shared lock is for reading only.o An exclusive lock is for writing as well as reading. o An advisory lock is informational only, and not enforced. (A "Keep Out"

sign, which may be ignored.)o A mandatory lock is enforced. (A truly locked door.) UNIX used advisory

locks, and Windows uses mandatory locks.

File Types

Windows (and some other systems) use special file extensions to indicate the type of each file. Macintosh stores a creator attribute for each file, according to the program that first created it with the create() system call. Macintosh stores a creator attribute for each file, according to the program that first created it with the

create() system call.

File Structure

Some files contain an internal structure, which may or may not be known to the OS. For the OS to support particular file formats increases the size and complexity of the OS.

UNIX treats all files as sequences of bytes, with no further consideration of the internal structure. (With the exception of executable binary programs, which it must know how to load and find the first executable statement, etc.)

Macintosh files have two forks - a resource fork, and a data fork. The resource fork contains information relating to the UI, such as icons and button images, and can be modified independently of the data fork, which contains the code or data as appropriate.

Internal File Structure

Disk files are accessed in units of physical blocks, typically 512 bytes or some power-of-two multiple thereof. (Larger physical disks use larger block sizes, to keep the range of block numbers within the range of a 32-bit integer.)

Internally files are organized in units of logical units, which may be as small as a single byte, or may be a larger size corresponding to some data record or structure size. The number of logical units which fit into one physical block determines its packing, and has an impact on the amount of internal fragmentation (wasted space) that occurs. As a general rule, half a physical block is wasted for each file, and the larger the block sizes the more space is lost to internal fragmentation.

ACCESS METHODS

Sequential Access: A sequential access file emulates magnetic tape operation, and generally supports a few operations: a) read next - read a record and advance the tape to the next position. b) write next - write a record and advance the tape to the next position. c) rewind d) skip n records - May or may not be supported. N may be limited to positive numbers, or may be limited to +/- 1.

Direct Access: Jump to any record and read that record. Operations supported include: read n - read record number n. (Note an argument is now required.) write n - write record number n. (Note an argument is now required.) jump to record n - could be 0 or the end of file. Query current record - used to return back to this record later. Sequential access can be easily emulated using direct access. The inverse is complicated and inefficient.

Other Access Methods: An indexed access scheme can be easily built on top of a direct access system. Very large files may require a multi-

tiered indexing scheme, i.e. indexes of indexes. (Lot of cool and relevant content is there in the book for all chapters)

DIRECTORY STRUCTURE

2CPU-Scheduling (Galvin)

Page 3: File systeminterface-pre-final-formatting

Storage Structure: A disk can be used in its entirety for a file system. Alternatively a physical disk can be broken up into multiple partitions, slices, or mini-disks, each of which becomes a virtual disk and can have its own filesystem. (or be used for raw storage, swap space, etc.) Or, multiple physical disks can be combined into one volume, i.e. a larger virtual disk, with its own filesystem spanning the physical disks.

Directory Overview: Directory operations to be supported include: a) Search for a file, b) Create a file (add to the directory) C) Delete a file (erase from the directory) d) List a directory (possibly ordered in different ways) e) Rename a file (may change sorting order) f) Traverse the file system.

Single-Level Directory: Simple to implement, but each file must have a unique name.

Two-Level Directory: Each user gets their own directory space. File names only need to be unique within a given user's directory. A master file directory is used to keep track of each users directory, and must be maintained when users are added to or removed from the system. A separate directory is generally needed for system (executable) files. Systems may or may not allow users to access other directories besides their own If access to other directories is allowed, then provision must be made to specify the directory being accessed. If access is denied, then special consideration must be made for users to run programs located in system directories. A search path is the list of directories in which to search for executable programs, and can be set uniquely for each user.

Tree-Structured Directories: This is an obvious extension to the two-tiered directory structure. Each user / process has the concept of a current directory from which all (relative) searches take place. Files may be accessed using either absolute pathnames (relative to the root of the tree) or relative pathnames (relative to the current directory.) Directories are stored the same as any other file in the system, except there is a bit that identifies them as directories, and they have some special structure that the OS understands.

Acyclic-Graph Directories: When the same files need to be accessed in more than one place in the directory structure (e.g. because they are being shared by more than one user / process), it can be useful to provide an acyclic-graph structure. UNIX provides two types of links for implementing the acyclic-graph structure. A hard link (usually just called a link) involves multiple directory entries that both refer to the same file. Hard links are only valid for ordinary files in the same filesystem. A symbolic link, that involves a special file, containing information about where to find the linked file. Symbolic links may be used to link directories and/or files in other filesystems, as well as ordinary files in the current filesystem. Windows only supports symbolic links, termed shortcuts. Hard links require a reference count, or link count for each file, keeping track of how many directory entries are currently referring to this

file. Whenever one of the references is removed the link count is reduced, and when it reaches zero, the disk space can be reclaimed.

General-Graph Directory: If cycles are allowed in the graphs, then several problems can arise: Search algorithms can go into infinite loops. One solution is to not follow links in search algorithms. (Or not to follow symbolic links, and to only allow symbolic links to refer to directories.) Sub-trees can become disconnected from the rest of the tree and still not have their reference counts reduced to zero. Periodic garbage collection is required to detect and resolve this problem. (chkdsk in DOS and fsck in UNIX search for these problems, among others, even though cycles

3CPU-Scheduling (Galvin)

Page 4: File systeminterface-pre-final-formatting

are not supposed to be allowed in either system. Disconnected disk blocks that are not marked as free are added back to the file systems with

made-up file names, and can usually be safely deleted.). Refer Figure 11.3

FILE SYSTEM MOUNTING

The basic idea behind mounting file systems is to combine multiple file systems into one large tree structure. The mount command is given a filesystem to mount and a mount point (directory) on which to attach it. Once a file system is mounted onto a mount point, any further references to that directory actually refer to the root of the mounted file system. Any files (or sub-directories) that had been stored in the mount point directory prior to mounting the new filesystem are now hidden by the mounted filesystem, and are no longer available. For this reason some systems only allow mounting onto empty directories. Filesystems can only be mounted by root, unless root has previously configured certain filesystems to be mountable onto certain pre-determined mount points. (E.g. root may allow users to mount floppy filesystems to /mnt or something like it.) Anyone can run the mount command to see what filesystems are currently mounted. Filesystems may be mounted read-only, or have other restrictions imposed. The traditional Windows OS runs an extended two-tier directory structure, where the first tier of the structure separates volumes by drive letters, and a tree structure is implemented below that level. Macintosh runs a similar system, where each new volume that is found is automatically mounted and added to the desktop when it is found. More recent Windows systems allow filesystems to be mounted to any directory in the filesystem, much like

UNIX.

FILE SHARING

Multiple Users: On a multi-user system, more information needs to be stored for each file: The owner (user) who owns the file, and who can control its access. The group of other user IDs that may have some special access to the file. What access rights are afforded to the owner (User), the Group, and to the rest of the world (the universe, a.k.a. Others.) Some systems have more complicated access control, allowing or denying specific accesses to specifically named users or groups.

Remote File Systems: The advent of the Internet introduces issues for accessing files stored on remote computers The original method was ftp, allowing individual files to be transported across systems as needed. Ftp can be either account and password controlled, or anonymous, not requiring any user name or password. Various forms of distributed file systems allow remote file systems to be mounted onto a local directory structure, and accessed using normal file access commands. (The actual files are still transported across the network as needed, possibly using ftp as the underlying transport mechanism.) The WWW has made it easy once again to access files on remote systems without mounting their filesystems, generally using (anonymous) ftp as the underlying file transport mechanism.

The Client-Server Model: When one computer system remotely mounts a filesystem that is physically located on another system, the system which physically owns the files acts as a server, and the system which mounts them is the client. User IDs and group IDs must be consistent across both systems for the system to work properly. (I.e. this is most applicable across multiple computers managed by the same organization, shared by a common group of users.) The same computer can be both a client and a server. (E.g. cross-linked file systems.). The NFS (Network File System) is a classic example of such a system.

Distributed Information Systems: The Domain Name System, DNS, provides for a unique naming system across all of the Internet. Domain names are maintained by the Network Information System, NIS. Microsoft's Common Internet File System, CIFS, establishes a network login for each user on a networked system with shared file access. Older Windows systems used domains, and newer systems (XP, 2000), use active directories. User names must match across the network for this system to be valid. A newer approach is the Lightweight Directory-Access Protocol, LDAP, which provides a secure single sign-on for all users to access all resources on a network. This is a secure system which is gaining in popularity, and which has the maintenance advantage of combining authorization information in one central location.

Consistency Semantics: Consistency Semantics deals with the consistency between the views of shared files on a networked system. When one user changes the file, when do other users see the changes?

PROTECTION

Files must be kept safe for reliability (against accidental damage), and protection (against deliberate malicious access.) The former is usually managed with backup copies. This section discusses the latter.

Types of Access: The following low-level operations are often controlled:o Read - View the contents of the fileo Write - Change the contents of the file.o Execute - Load the file onto the CPU and follow the instructions contained therein.o Append - Add to the end of an existing file.

4CPU-Scheduling (Galvin)

Page 5: File systeminterface-pre-final-formatting

o Delete - Remove a file from the system.o List -View the name and other attributes of files on the system. Higher-level operations, such as copy, can generally be performed through combinations of the above.

Access Control: One approach is to have complicated Access Control Lists, ACL, which specify exactly what access is allowed or denied for specific users or groups. The AFS uses this system for distributed access. Control is very finely adjustable, but may be complicated, particularly when the specific users involved are unknown. (AFS allows some wild cards, so for example all users on a certain remote system may be trusted, or a given username may be trusted when accessing from any remote system.) UNIX uses a set of 9 access control bits, in three groups of three. These correspond to R, W, and X permissions for each of the Owner, Group, and Others. (See "man chmod" for full details.) The RWX bits control the following privileges for ordinary files and directories:

bit Files Directories

R Read (view) file contents. Read directory contents. Required to get a listing of the directory.

W Write (change) file contents. Change directory contents. Required to create or delete files.

XExecute file contents as a program.

Access detailed directory information. Required to get a long listing, or to access any specific file in the directory. Note that if a user has X but not R permissions on a directory, they can still access specific files, but only if they already know the name of the file they are trying to access.

In addition there are some special bits that can also be applied: The set user ID (SUID) bit and/or the set group ID (SGID) bits applied to executable files temporarily change the identity of whoever runs the program to match that of the owner / group of the executable program. This allows users running specific programs to have access to files (while running that program) to which they would normally be unable to access. Setting of these two bits is usually restricted to root, and must be done with caution, as it introduces a potential security leak. Windows adjusts files access through a simple GUI.

Other Protection Approaches and Issues:o Older systems which did not originally have multi-user file access permissions (DOS and older versions of Mac) must now be

retrofitted if they are to share files on a network. o Access to a file requires access to all the files along its path as well. In a cyclic directory structure, users may have different access to

the same file accessed through different paths. o Sometimes just the knowledge of the existence of a file of a certain name is a security (or privacy) concern. Hence the distinction

between the R and X bits on UNIX directories.

Assorted Content

XXX

To be cleared

I

Glossary

Read Later

Further Reading5

CPU-Scheduling (Galvin)

Page 6: File systeminterface-pre-final-formatting

S

Grey Areas

XXX

6CPU-Scheduling (Galvin)