operating system lecture notes
DESCRIPTION
TRANSCRIPT
A.V.C.COLLEGE OF ENGINEERING
MANNAMPANDAL, MAYILADUTHURAI-609 305
COURSE MATERIAL
FOR THE SUBJECT OF
OPERATING SYSTEMS
Subject Code : CS 2254
Semester :IV SEMESTER
Department : B.E CSE
Academic Year : 2012-2013
Name of the Faculty : M.PARVATHI
Designation and Dept : Asst Prof /CSE
ANNA UNIVERSITY TIRUCHIRAPPALLITiruchirappalli – 620 024
Regulations 2008Curriculum
B.E. COMPUTER SCIENCE AND ENGINEERING
SEM IV
CS1253 – OPERATING SYSTEMS(Common to CSE and IT)
UNIT I PROCESSES AND THREADS 9
Introduction to operating systems – Review of computer organization – Operating system structures – System calls – System programs – System structure – Virtual machines – Processes – Process concept – Process scheduling – Operations on processes – Cooperating processes – Interprocess communication – Communication in client-server systems – Case study – IPC in linux – Threads – Multi-threading models – Threading issues – Case study – Pthreads library.
UNIT II PROCESS SCHEDULING AND SYNCHRONIZATION 10CPU scheduling – Scheduling criteria – Scheduling algorithms – Multiple – Processor scheduling – Real time scheduling – Algorithm evaluation – Case study – Process scheduling in Linux – Process synchronization – The critical-section problem– Synchronization hardware – Semaphores – Classic problems of synchronization – Critical regions – Monitors – Deadlock system model – Deadlock characterization –Methods for handling deadlocks – Deadlock prevention – Deadlock avoidance – Deadlock detection – Recovery from deadlock.
UNIT III STORAGE MANAGEMENT 9Memory management – Background – Swapping – Contiguous memory allocation –Paging – Segmentation – Segmentation with paging – Virtual memory – Background– Demand paging – Process creation – Page replacement – Allocation of frames – Thrashing – Case study – Memory management in Linux.
UNIT IV FILE SYSTEMS 9File system interface – File concept – Access methods – Directory structure – Filesystem mounting – Protection – File system implementation – Directoryimplementation – Allocation methods – Free space management – Efficiency and performance – Recovery – Log structured file systems – Case studies – File system in Linux – File system in Windows XP.
UNIT V I/O SYSTEMS 8I/O Systems – I/O Hardware – Application I/O interface – Kernel I/O subsystem –Streams – Performance – Mass-storage structure – Disk scheduling – Disk
management – Swap-space management – RAID – Disk attachment – Stable storage – Tertiary storage – Case study – I/O in Linux.
Total: 45
TEXT BOOK
1. Silberschatz, Galvin and Gagne, “Operating System Concepts”, 6th Edition, Wiley India Pvt. Ltd., 2003.
REFERENCES
1. Tanenbaum, A.S., “Modern Operating Systems”, 2nd Edition, PearsonEducation, 2004.2. Gary Nutt, “Operating Systems”, 3rd Edition, Pearson Education, 2004.3. William Stallings, “Operating Systems”, 4th Edition, Prentice Hall of India,2003.
1.Introduction
1.1 Introduction An operating system act as an intermediary between the user of a computer and computer hardware. The purpose of an operating system is to provide an environment in which a user can execute programs in a convenient and efficient manner. An operating system is a software that manages the computer hardware. The hardware must provide appropriate mechanisms to ensure the correct operation of the computer system and to prevent user programs from interfering with the proper operation of the system. 1.2 Operating System 1.2.1 Definition of Operating System: An Operating system is a program that controls the execution of application programs and acts as an interface between the user of a computer and the computer hardware.
A more common definition is that the operating system is the one program running at all times on the computer (usually called the kernel), with all else being applications programs.
An Operating system is concerned with the allocation of resources and services, such as memory, processors, devices and information. The Operating System correspondingly includes programs to manage these resources, such as a traffic controller, a scheduler, memory management module, I/O programs, and a file system.
1.2.2 Functions of Operating System Operating system performs three functions: 1. Convenience: An OS makes a computer more convenient to use.
2. Efficiency: An OS allows the computer system resources to be used in an efficient manner.
3. Ability to Evolve: An OS should be constructed in such a way as to permit the effective development, testing and introduction of new system functions without at the same time interfering with service.
1.2.3 Operating System as User Interface Every general purpose computer consists of the hardware, operating system, system programs, application programs. The hardware consists of memory, CPU, ALU, I/O devices, peripheral device and storage device. System program consists of compilers, loaders, editors, OS etc. The application program consists of business program, database program.
The fig. 1.1 shows the conceptual view of a computer system
Fig 1.1 Conceptual view of a computer system Every computer must have an operating system to run other programs. The operating system and coordinates the use of the hardware among the various system programs and application program for a various users. It simply provides an environment within which other programs can do useful work.
The operating system is a set of special programs that run on a computer system that allow it to work properly. It performs basic tasks such as recognizing input from the keyboard, keeping track of files and directories
on the disk, sending output to the display screen and controlling a peripheral devices.
OS is designed to serve two basic purposes :
1. It controls the allocation and use of the computing system‘s resources among the various user and tasks.
2. It provides an interface between the computer hardware and the programmer that simplifies and makes feasible for coding, creation, debugging of application programs.
The operating system must support the following tasks. The tasks are :
1. Provides the facilities to create, modification of program and data files using and editor.
2. Access to the compiler for translating the user program from high level language to machine language.
3. Provide a loader program to move the compiled program code to the computer‘s memory for execution.
4. Provide routines that handle the details of I/O programming.
1.3 I/O System Management I/O System Management The module that keeps track of the status of devices is called the I/O traffic controller. Each I/O device has a device handler that resides in a separate process associated with that device.
The I/O subsystem consists of
1. A memory management component that includes buffering, caching and spooling.
2. A general device driver interface.
Drivers for specific hardware devices. 1.4 Assembler Input to an assembler is an assembly language program. Output is an object program plus information that enables the loader to prepare the object program for execution. At one time, the computer programmer had at his disposal a basic machine that interpreted, through hardware, certain fundamental instructions. He would program this computer by writing a series of ones and zeros(machine language), place them into the memory of the machine. 1.5 Compiler The high level languages – examples are FORTRAN, COBOL, ALGOL and PL/I – are processed by compilers and interpreters. A compilers is a program that accepts a source program in a ―high-level language‖ and produces a corresponding object program. An interpreter is a program that appears to execute a source program as if it was machine language. The same name (FORTRAN, COBOL etc) is often used to designate both a compiler and its associated language. 1.6 Loader
A loader is a routine that loads an object program and prepares it for execution. There are various loading schemes: absolute, relocating and direct-linking. In general, the loader must load, relocate, and link the object program. Loader is a program that places programs into memory and prepares them for execution. In a simple loading scheme, the assembler outputs the machine language translation of a program on a secondary device and a loader is placed in core. The loader places into memory the machine language version of the user‘s program and transfers control to it. Since the loader program is much smaller than the assembler, thos makes more core available to user‘s program. 1.7 History of Operating System Operating systems have been evolving through the years. Following table shows the history of OS. Generation Year Electronic
devices used Types of OS and devices
First 1945 – 55 Vacuum tubes Plug boards Second 1955 – 1965 Transistors Batch system Third 1965 – 1980 Integrated Circuit
(IC) Multiprogramming
Fourth Since 1980 Large scale integration
PC
The 1960’s definition of an operating system is “the software that controls the
hardware”. However, today, due to microcode we need a better definition. We see an
operating system as the programs that make the hardware useable. In brief, an operating
system is the set of programs that controls a computer. Some examples of operating
systems are UNIX, Mach, MS-DOS, MS-Windows, Windows/NT, Chicago, OS/2,
MacOS, VMS, MVS, and VM.
Controlling the computer involves software at several levels. We will differentiate
kernel services, library services, and application-level services, all of which are part of
the operating system. Processes run Applications, which are linked together with libraries
that perform standard services. The kernel supports the processes by providing a path to
the peripheral devices. The kernel responds to service calls from the processes and
interrupts from the devices.
The core of the operating system is the kernel, a control program that functions in
privileged state (an execution context that allows all hardware instructions to be
executed), reacting to interrupts from external devices and to service requests and traps
from processes. Generally, the kernel is a permanent resident of the computer. It creates
and terminates processes and responds to their request for service.
Batch Systems
.
Batch operating system is one where programs and data are collected
together in a batch before processing starts. A job is predefined sequence of
commands, programs and data that are combined in to a single unit called job.
.
Fig. 2.1 shows the memory layout for a simple batch system. Memory
management in batch system is very simple. Memory is usually divided into
two areas : Operating system and user program area.
Scheduling is also simple in batch system. Jobs are processed in the order of submission i.e first come first served fashion.
When job completed execution, its memory is releases and the output for the job gets copied into an output spool for later printing.
Batch system often provides simple forms of file management. Access to file is serial. Batch systems do not require any time critical device management.
Batch systems are inconvenient for users because users can not interact with their jobs to fix problems. There may also be long turn around times. Example of this system id generating monthly bank statement. Advantages o Batch System Move much of the work of the operator to the computer.
Increased performance since it was possible for job to start as soon as the previous job finished.
Disadvantages of Batch System Turn around time can be large from user standpoint.
Difficult to debug program.
A job could enter an infinite loop.
A job could corrupt the monitor, thus affecting pending jobs.
Due to lack of protection scheme, one batch job can affect pending jobs.
2.5 Time Sharing Systems Multi-programmed batched systems provide an environment where the various system resources (for example, CPU, memory, peripheral devices) are utilized effectively.
Time sharing, or multitasking, is a logical extension of multiprogramming. Multiple jobs are executed by the CPU switching between them, but the
switches occur so frequently that the users may interact with each program while it is running.
An interactive, or hands-on, computer system provides on-line communication between the user and the system. The user gives instructions to the operating system or to a program directly, and receives an immediate response. Usually, a keyboard is used to provide input, and a display screen (such as a cathode-ray tube (CRT) or monitor) is used to provide output.
If users are to be able to access both data and code conveniently, an on-line file system must be available. A file is a collection of related information defined by its creator. Batch systems are appropriate for executing large jobs that need little interaction.
Time-sharing systems were developed to provide interactive use of a computer system at a reasonable cost. A time-shared operating system uses CPU scheduling and multiprogramming to provide each user with a small portion of a time-shared computer. Each user has at least one separate program in memory. A program that is loaded into memory and is executing is commonly referred to as a process. When a process executes, it typically executes for only a short time before it either finishes or needs to perform I/O. I/O may be interactive; that is, output is to a display for the user and input is from a user keyboard. Since interactive I/O typically runs at people speeds, it may take a long time to completed.
A time-shared operating system allows the many users to share the computer simultaneously. Since each action or command in a time-shared system tends to be short, only a little CPU time is needed for each user. As the system switches rapidly from one user to the next, each user is given the impression that she has her own computer, whereas actually one computer is being shared among many users.
Time-sharing operating systems are even more complex than are multi-programmed operating systems. As in multiprogramming, several jobs must be kept simultaneously in memory, which requires some form of memory management and protection.
2.6 Multiprogramming When two or more programs are in memory at the same time, sharing the processor is referred to the multiprogramming operating system. Multiprogramming assumes a single processor that is being shared. It increases CPU utilization by organizing jobs so that the CPU always has one to execute.
Fig. 2.2 shows the memory layout for a multiprogramming system.
The operating system keeps several jobs in memory at a time. This set of jobs is a subset of the jobs kept in the job pool. The operating system picks and begins to execute one of the job in the memory.
Multiprogrammed system provide an environment in which the various system resources are utilized effectively, but they do not provide for user interaction with the computer system.
Jobs entering into the system are kept into the memory. Operating system picks the job and begins to execute one of the job in the memory. Having
several programs in memory at the same time requires some form of memory management.
Multiprogramming operating system monitors the state of all active programs and system resources. This ensures that the CPU is never idle unless there are no jobs.
Advantages 1. High CPU utilization.
2. It appears that many programs are allotted CPU almost simultaneously.
Disadvantages 1. CPU scheduling is requires.
2. To accommodate many jobs in memory, memory management is required.
2.7 Spooling Acronym for simultaneous peripheral operations on line. Spooling refers to putting jobs in a buffer, a special area in memory or on a disk where a device can access them when it is ready.
Spooling is useful because device access data that different rates. The buffer provides a waiting station where data can rest while the slower device catches up. Fig 2.3 shows the spooling.
System Components
Even though, not all systems have the same structure many modern operating
systems share the same goal of supporting the following types of system components.
Process Management
The operating system manages many kinds of activities ranging from user
programs to system programs like printer spooler, name servers, file server etc. Each of
these activities is encapsulated in a process. A process includes the complete execution
context (code, data, PC, registers, OS resources in use etc.)
It is important to note that a process is not a program. A process is only ONE
instant of a program in execution. There are many processes can be running the same
program. The five major activities of an operating system in regard to process
management are
Creation and deletion of user and system processes.
Suspension and resumption of processes.
A mechanism for process synchronization.
A mechanism for process communication.
A mechanism for deadlock handling.
Main-Memory Management
Primary-Memory or Main-Memory is a large array of words or bytes. Each word
or byte has its own address. Main-memory provides storage that can be access directly by
the CPU. That is to say for a program to be executed, it must in the main memory.
The major activities of an operating in regard to memory-management are:
Keep track of which part of memory are currently being used and by
whom.
Decide which process is loaded into memory when memory space becomes available.
Allocate and deallocate memory space as needed.
File Management
A file is a collected of related information defined by its creator. Computer can
store files on the disk (secondary storage), which provide long term storage. Some
examples of storage media are magnetic tape, magnetic disk and optical disk. Each of
these media has its own properties like speed, capacity, data transfer rate and access
methods.
File systems normally organized into directories to ease their use.
These directories may contain files and other directions.
The five main major activities of an operating system in regard to file
management are
The creation and deletion of files.
The creation and deletion of directions.
The support of primitives for manipulating files and directions.
The mapping of files onto secondary storage.
The back up of files on stable storage media.
I/O System Management
I/O subsystem hides the peculiarities of specific hardware devices from the user.
Only the device driver knows the peculiarities of the specific device to whom it is
assigned.
Secondary-Storage Management
Generally speaking, systems have several levels of storage, including primary
storage, secondary storage and cache storage. Instructions and data must be placed in
primary storage or cache to be referenced by a running program. Because main memory
is too small to accommodate all data and programs, and its data are lost when power is
lost, the computer system must provide secondary storage to back up main memory.
Secondary storage consists of tapes, disks, and other media designed to hold information
that will eventually be accessed in primary storage (primary, secondary, cache) is
ordinarily divided into bytes or words consisting of a fixed number of bytes. Each
location in storage has an address; the set of all addresses available to a program is called
an address space.
The three major activities of an operating system in regard to secondary storage
management are:
Managing the free space available on the secondary-storage device.
Allocation of storage space when new files have to be written.
Scheduling the requests for memory access.
Networking
A distributed systems is a collection of processors that do not share memory,
peripheral devices, or a clock. The processors communicate with one another through
communication lines called network. The communication-network design must consider
routing and connection strategies, and the problems of contention and security.
Protection System
If computer systems has multiple users and allows the concurrent execution of
multiple processes, then the various processes must be protected from one another's
activities. Protection refers to mechanism for controlling the access of programs,
processes, or users to the resources defined by computer systems.
Command Interpreter System
A command interpreter is an interface of the operating system with the user. The
user gives commands with are executed by operating system (usually by turning them
into system calls). The main function of a command interpreter is to get and execute the
next user specified command. Command-Interpreter is usually not part of the kernel,
since multiple command interpreters (shell, in UNIX terminology) may be support by an
operating system, and they do not really need to run in kernel mode. There are two main
advantages to separating the command interpreter from the kernel.
If we want to change the way the command interpreter looks, i.e., I want to
change the interface of command interpreter, I am able to do that if the command
interpreter is separate from the kernel. I cannot change the code of the kernel so I cannot
modify the interface.
If the command interpreter is a part of the kernel it is possible for a malicious
process to gain access to certain part of the kernel that it showed not have to avoid this
ugly scenario it is advantageous to have the command interpreter separate from kernel.
Operating Systems Services
Following are the five services provided by an operating systems to the convenience of
the users.
Program Execution
The purpose of a computer systems is to allow the user to execute programs. So
the operating systems provides an environment where the user can conveniently run
programs. The user does not have to worry about the memory allocation or multitasking
or anything. These things are taken care of by the operating systems.
Running a program involves the allocating and deallocating memory, CPU scheduling in
case of multiprocess. These functions cannot be given to the user-level programs. So
user-level programs cannot help the user to run programs independently without the help
from operating systems.
I/O Operations
Each program requires an input and produces output. This involves the use of I/O.
The operating systems hides the user the details of underlying hardware for the I/O. All
the user sees is that the I/O has been performed without any details. So the operating
systems by providing I/O makes it convenient for the users to run programs.
For efficiently and protection users cannot control I/O so this service cannot be provided
by user-level programs.
File System Manipulation
The output of a program may need to be written into new files or input taken from
some files. The operating systems provide this service. The user does not have to worry
about secondary storage management. User gives a command for reading or writing to a
file and sees his task accomplished. Thus operating systems make it easier for user
programs to accomplish their task.
This service involves secondary storage management. The speed of I/O that
depends on secondary storage management is critical to the speed of many programs and
hence I think it is best relegated to the operating systems to manage it than giving
individual users the control of it. It is not difficult for the user-level programs to provide
these services but for above mentioned reasons it is best if this service s left with
operating system.
Communications
There are instances where processes need to communicate with each other to
exchange information. It may be between processes running on the same computer or
running on the different computers. By providing this service the operating system
relieves the user of the worry of passing messages between processes. In case where the
messages need to be passed to processes on the other computers through a network it can
be done by the user programs. The user program may be customized to the specifics of
the hardware through which the message transits and provides the service interface to the
operating system.
Error Detection
An error is one part of the system may cause malfunctioning of the complete
system. To avoid such a situation the operating system constantly monitors the system for
detecting the errors. This relieves the user of the worry of errors propagating to various
part of the system and causing malfunctioning.
This service cannot allow to be handled by user programs because it involves
monitoring and in cases altering area of memory or deallocation of memory for a faulty
process. Or may be relinquishing the CPU of a process that goes into an infinite loop.
These tasks are too critical to be handed over to the user programs. A user program if
given these privileges can interfere with the correct (normal) operation of the operating
systems.
System Calls and System Programs
System calls provide an interface between the process an the operating system.
System calls allow user-level processes to request some services from the operating
system which process itself is not allowed to do. In handling the trap, the operating
system will enter in the kernel mode, where it has access to privileged instructions, and
can perform the desired service on the behalf of user-level process. It is because of the
critical nature of operations that the operating system itself does them every time they are
needed. For example, for I/O a process involves a system call telling the operating system
to read or write particular area and this request is satisfied by the operating system.
System programs provide basic functioning to users so that they do not need to write t
heir own environment for program development (editors, compilers) and program
execution (shells). In some sense, they are bundles of useful system calls.
Layered Approach Design
In this case the system is easier to debug and modify, because changes aff ect only
limited portions of the code, and programmer does not have to know the details of the
other layers. Information is also kept only where it is needed and is accessible only in
certain ways, so bugs affecting that data are limited to a specific module or layer.
Mechanisms and Policies
The policies what is to be done while the mechanism specifies how it is to be
done. For instance, the timer construct for ensuring CPU protection is mechanism. On the
other hand, the decision of how long the timer is set for a particular user is a policy
decision.
The separation of mechanism and policy is important to provide flexibility to a
system. If the interface between mechanism and policy is well defined, the change of
policy may affect only a few parameters. On the other hand, if interface between these
two is vague or not well defined, it might involve much deeper change to the system.
Once the policy has been decided it gives the programmer the choice of using his/her
own implementation. Also, the underlying implementation may be changed for a more
efficient one without much trouble if the mechanism and policy are well defined.
Specifically, separating these two provides flexibility in a variety of ways.
First, the same mechanism can be used to implement a variety of policies, so
changing the policy might not require the development of a new mechanism, but just a
change in parameters for that mechanism, but just a change in parameters for that
mechanism from a library of mechanisms.
Second, the mechanism can be changed for example, to increase its efficiency or
to move to a new platform, without changing the overall policy.
Layered Approach Design
In this case the system is easier to debug and modify, because changes affect only
limited portions of the code, and programmer does not have to know the details of the
other layers. Information is also kept only where it is needed and is accessible only in
certain ways, so bugs affecting that data are limited to a specific module or layer.
Definition of Process
The notion of process is central to the understanding of operating systems. There are
quite a few definitions presented in the literature, but no "perfect" definition has yet
appeared.
Definition
The term "process" was first used by the designers of the MULTICS in 1960's. Since
then, the term process, used somewhat interchangeably with 'task' or 'job'. The process
has been given many definitions for instance
A program in Execution.
An asynchronous activity.
The 'animated sprit' of a procedure in execution.
The entity to which processors are assigned.
The 'dispatch able' unit. Many more definitions have given.
As we can see from above that there is no universally agreed upon definition, but
the definition "Program in Execution" seem to be most frequently used. And this is a
concept are will use in the present study of operating systems.
Now that we agreed upon the definition of process, the question is what the
relation between process and program is. It is same beast with different name or when
this beast is sleeping (not executing) it is called program and when it is executing
becomes process. Well, to be very precise. Process is not the same as program. In the
following discussion we point out some of the difference between process and program.
As we have mentioned earlier.
Process is not the same as program. A process is more than a program code. A
process is an 'active' entity as oppose to program which consider to be a 'passive' entity.
As we all know that a program is an algorithm expressed in some suitable notation, (e.g.,
programming language). Being a passive, a program is only a part of process. Process, on
the other hand, includes:
Current value of Program Counter (PC)
Contents of the processors registers
Value of the variables
The process stack (SP) which typically contains temporary data such as subroutine
parameter, return address, and temporary variables.
A data section that contains global variables.
A process is the unit of work in a system.
In Process model, all software on the computer is organized into a number of sequential
processes. A process includes PC, registers, and variables. Conceptually, each process
has its own virtual CPU. In reality, the CPU switches back and forth among processes.
(The rapid switching back and forth is called multiprogramming).
Process State
The process state consist of everything necessary to resume the process execution if it is
somehow put aside temporarily. The process state consists of at least following:
Code for the program.
Program's static data.
Program's dynamic data.
Program's procedure call stack. Contents
of general purpose registers. Contents of
program counter (PC) Contents of
program status word (PSW). Operating
Systems resource in use. Process
Operations
Process Creation
In general-purpose systems, some way is needed to create processes as needed during
operation. There are four principal events led to processes creation.
System initialization.
Execution of a process Creation System calls by a running process.
A user request to create a new process.
Initialization of a batch job.
Foreground processes interact with users. Background processes that stay in background
sleeping but suddenly springing to life to handle activity such as email, webpage,
printing, and so on. Background processes are called daemons. This call creates an exact
clone of the calling process.
A process may create a new process by some create process such as 'fork'. It choose to
does so, creating process is called parent process and the created one is called the child
processes. Only one parent is needed to create a child process. Note that unlike plants and
animals that use sexual representation, a process has only one parent. This creation of
process (processes) yields a hierarchical structure of processes like one in the figure.
Notice that each child has only one parent but each parent may have many children. After
the fork, the two processes, the parent and the child, have the same memory image, the
same environment strings and the same open files. After a process is created, both the
parent and child have their own distinct address space. If either process changes a word in
its address space, the change is not visible to the other process.
Following are some reasons for creation of a process
User logs on.
User starts a program.
Operating systems creates process to provide service, e.g., to manage printer.
Some program starts another process, e.g., Netscape calls xv to display a picture.
Process Termination
A process terminates when it finishes executing its last statement. Its resources
are returned to the system, it is purged from any system lists or tables, and its process
control block (PCB) is erased i.e., the PCB's memory space is returned to a free memory
pool. The new process terminates the existing process, usually due to following reasons:
Normal Exist
Most processes terminates because they have done their job. This call is exist in
UNIX.
Error Exist
When process discovers a fatal error. For example, a user tries to compile a
program that does not exist.
Fatal Error
An error caused by process due to a bug in program for example, executing an
illegal instruction, referring non-existing memory or dividing by zero.
Killed by another Process
A process executes a system call telling the Operating Systems to terminate some
other process. In UNIX, this call is kill. In some systems when a process kills all
processes it created are killed as well (UNIX does not work this way).
Process States
A process goes through a series of discrete process states.
New State
The process being created.
Terminated State
The process has finished execution.
Blocked (waiting) State
When a process blocks, it does so because logically it cannot continue, typically
because it is waiting for input that is not yet available. Formally, a process is said to be
blocked if it is waiting for some event to happen (such as an I/O completion) before it can
proceed. In this state a process is unable to run until some external event happens.
Running State
A process is said t be running if it currently has the CPU, that is, actually using
the CPU at that particular instant.
Ready State
A process is said to be ready if it use a CPU if one were available. It is runable but
temporarily stopped to let another process run.
Logically, the 'Running' and 'Ready' states are similar. In both cases the process is willing
to run, only in the case of 'Ready' state, there is temporarily no CPU available for it. The
'Blocked' state is different from the 'Running' and 'Ready' states in that the process cannot
run, even if the CPU is available.
Process State Transitions
Following are six(6) possible transitions among above mentioned five (5) states
Transition 1 occurs when process discovers that it cannot continue. If running process
initiates an I/O operation before its allotted time expires, the running process voluntarily
relinquishes the CPU.
This state transition is:
Block (process-name): Running → Block.
Transition 2
occurs when the scheduler decides that the running process has run long enough and it is
time to let another process have CPU time.
This state transition is:
Time-Run-Out (process-name): Running → Ready.
Transition 3
occurs when all other processes have had their share and it is time for the first process to
run again
This state transition is:
Dispatch (process-name): Ready → Running.
Transition 4
occurs when the external event for which a process was waiting (such as arrival of input)
happens.
This state transition is:
Wakeup (process-name): Blocked → Ready.
Transition 5
occurs when the process is created.
This state transition is:
Admitted (process-name): New → Ready.
Transition 6
occurs when the process has finished execution.
This state transition is:
Exit (process-name): Running → Terminated.
Process Control Block
A process in an operating system is represented by a data structure known as a process
control block (PCB) or process descriptor. The PCB contains important information
about the specific process including
The current state of the process i.e., whether it is ready, running, waiting, or whatever.
Unique identification of the process in order to track "which is which" information.
A pointer to parent process.
Similarly, a pointer to child process (if it exists).
The priority of process (a part of CPU scheduling information).
Pointers to locate memory of processes.
A register save area.
The processor it is running on.
The PCB is a certain store that allows the operating systems to locate key information
about a process. Thus, the PCB is the data structure that defines a process to the operating
systems.
Threads
Threads
Despite of the fact that a thread must execute in process, the process and its
associated threads are different concept. Processes are used to group resources together
and threads are the entities scheduled for execution on the CPU.
A thread is a single sequence stream within in a process. Because threads have some of
the properties of processes, they are sometimes called lightweight processes. In a process,
threads allow multiple executions of streams. In many respect, threads are popular way to
improve application through parallelism. The CPU switches rapidly back and forth
among the threads giving illusion that the threads are running in parallel. Like a
traditional process i.e., process with one thread, a thread can be in any of several states
(Running, Blocked, Ready or Terminated). Each thread has its own stack. Since thread
will generally call different procedures and thus a different execution history.
This is why thread needs its own stack. An operating system that has thread facility, the
basic unit of CPU utilization is a thread. A thread has or consists of a program counter
(PC), a register set, and a stack space. Threads are not independent of one other like
processes as a result threads shares with other threads their code section, data section, OS
resources also known as task, such as open files and signals.
Processes Vs Threads
As we mentioned earlier that in many respect threads operate in the same way as
that of processes. Some of the similarities and differences are:
Similarities
Like processes threads share CPU and only one thread active (running) at a time.
Like processes, threads within a processes, threads within a processes execute
sequentially.
Like processes, thread can create children.
And like process, if one thread is blocked, another thread can run.
Differences
Unlike processes, threads are not independent of one another.
Unlike processes, all threads can access every address in the task .
Unlike processes, thread is design to assist one other. Note that processes might or
might not assist one another because processes may originate from different
users.
Why Threads?
Following are some reasons why we use threads in designing operating systems.
Processes with multiple threads make a great server for example printer server.
Because threads can share common data, they do not need to use interposes
communication.
Because of the very nature, threads can take advantage of multiprocessors.
Threads are cheap in the sense that They only need a stack and storage for registers
therefore, threads are cheap to create.
Threads use very little resources of an operating system in which they are
working. That is, threads do not need new address space, global data, program code or
operating system resources. Context switching are fast when working with threads. The
reason is that we only have to save and/or restore PC, SP and registers.
But this cheapness does not come free - the biggest drawback is that there is no
protection between threads.
User-Level Threads
User-level threads implement in user-level libraries, rather than via systems calls,
so thread switching does not need to call operating system and to cause interrupt to the
kernel. In fact, the kernel knows nothing about user-level threads and manages them as if
they were single-threaded processes.
Advantages:
The most obvious advantage of this technique is that a user-level threads package
can be implemented on an Operating System that does not support threads. Some other
advantages are
User-level threads do not require modification to operating systems.
Simple Representation:
Each thread is represented simply by a PC, registers, stack and a small control
block, all stored in the user process address space.
Simple Management:
This simply means that creating a thread, switching between threads and
synchronization between threads can all be done without intervention of the kernel.
Fast and Efficient:
Thread switching is not much more expensive than a procedure call.
Disadvantages:
There is a lack of coordination between threads and operating system kernel .
Therefore, process as whole gets one time slice irrespective of whether process has one
thread or 1000 threads within. It is up to each thread to relinquish control to other threads.
User-level threads require non-blocking systems call i.e., a multithreaded kernel.
Otherwise, entire process will blocked in the kernel, even if there are unable threads left
in the processes. For example, if one thread causes a page fault, the process blocks.
Kernel-Level Threads
In this method, the kernel knows about and manages the threads. No runtime
system is needed in this case. Instead of thread table in each process, the kernel has a
thread table that keeps track of all threads in the system. In addition, the kernel also
maintains the traditional process table to keep track of processes. Operating Systems
kernel provides system call to create and manage threads.
Advantages:
Because kernel has full knowledge of all threads, Scheduler may decide to give
more time to a process having large number of threads than process having small number
of threads.
Kernel-level threads are especially good for applications that frequently block.
Disadvantages:
The kernel-level threads are slow and inefficient. For instance, threads operations
are hundreds of times slower than that of user-level threads.
Since kernel must manage and schedule threads as well as processes. It requires a full
thread control block (TCB) for each thread to maintain information about threads. As a
result there is significant overhead and increased in kernel complexity.
Advantages of Threads over Multiple Processes
Context Switching
Threads are very inexpensive to create and destroy, and they are inexpensive to
represent. For example, they require space to store, the PC, the SP, and the general -
purpose registers, but they do not require space to share memory information,
Information about open files of I/O devices in use, etc. With so little context, it is much
faster to switch between threads. In other words, it is relatively easier for a contex t
switch using threads.
Sharing
Treads allow the sharing of a lot resources that cannot be shared in process, for
example, sharing code section, data section, Operating System resources like open file
etc.
Disadvantages of Threads over Multiprocesses
Blocking
The major disadvantage if that if the kernel is single threaded, a system call of
one thread will block the whole process and CPU may be idle during the blocking period.
Security
Since there is, an extensive sharing among threads there is a potential problem of
security. It is quite possible that one thread over writes the stack of another thread (or
damaged shared data) although it is very unlikely since threads are meant to cooperate on
a single task.
Application that Benefits from Threads
A proxy server satisfying the requests for a number of computers on a LAN
would be benefited by a multi-threaded process. In general, any program that has to do
more than one task at a time could benefit from multitasking. For example, a program
that reads input, process it, and outputs could have three threads, one for each task.
Application that cannot Benefit from Threads
Any sequential process that cannot be divided into parallel task will not benefit
from thread, as they would block until the previous one completes. For example, a
program that displays the time of the day would not benefit from multiple threads.
Resources used in Thread Creation and Process Creation
When a new thread is created it shares its code section, data section and operating
system resources like open files with other threads. But it is allocated its own stack,
register set and a program counter.
The creation of a new process differs from that of a thread mainly in the fact that all the
shared resources of a thread are needed explicitly for each process. So though two
processes may be running the same piece of code they need to have their own copy of the
code in the main memory to be able to run. Two processes also do not share other
resources with each other. This makes the creation of a new process very costly
compared to that of a new thread.
Context Switch
To give each process on a multiprogrammed machine a fair share of the CPU, a
hardware clock generates interrupts periodically. This allows the operating system to
schedule all processes in main memory (using scheduling algorithm) to run on the CPU at
equal intervals. Each time a clock interrupt occurs, the interrupt handler checks how
much time the current running process has used. If it has used up its entire time slice,
then the CPU scheduling algorithm (in kernel) picks a different process to run. Each
switch of the CPU from one process to another is called a context switch.
Major Steps of Context Switching
The values of the CPU registers are saved in the process table of the process that
was running just before the clock interrupt occurred.
The registers are loaded from the process picked by the CPU scheduler to run next.
In a multiprogrammed uniprocessor computing system, context switches occur frequently
enough that all processes appear to be running concurrently. If a process has more than
one thread, the Operating System can use the context switching technique to schedule the
threads so they appear to execute in parallel. This is the case if threads are implemented
at the kernel level. Threads can also be implemented entirely at the user level in run -time
libraries. Since in this case no thread scheduling is provided by the Operating System, it
is the responsibility of the programmer to yield the CPU frequently enough in each thread
so all threads in the process can make progress.
Action of Kernel to Context Switch Among Threads
The threads share a lot of resources with other peer threads belonging to the same
process. So a context switch among threads for the same process is easy. It involves
switch of register set, the program counter and the stack. It is relatively easy for the
kernel to accomplish this task.
Action of kernel to Context Switch Among Processes
Context switches among processes are expensive. Before a process can be
switched its process control block (PCB) must be saved by the operating system. The
PCB consists of the following information:
The process state.
The program counter, PC.
The values of the different registers.
The CPU scheduling information for the process.
Memory management information regarding the process.
Possible accounting information for this process.
I/O status information of the process.
When the PCB of the currently executing process is saved the operating system loads the
PCB of the next process that has to be run on CPU. This is a heavy task and it takes a lot
of time.
Solaris-2 Operating Systems
Introduction
At user-level
At Intermediate-level
At kernel-level
Introduction
The solaris-2 Operating Systems supports:
threads at the user-level.
threads at the kernel-level.
symmetric multiprocessing and
real-time scheduling.
The entire thread system in Solaris is depicted in following figure.
At user-level
The user-level threads are supported by a library for the creation and scheduling
and kernel knows nothing of these threads.
These user-level threads are supported by lightweight processes (LWPs). Each LWP is
connected to exactly one kernel-level thread is independent of the kernel.
Many user-level threads may perform one task. These threads may be scheduled
and switched among LWPs without intervention of the kernel.
User-level threads are extremely efficient because no context switch is needs to
block one thread another to start running.
Resource needs of User-level Threads
A user-thread needs a stack and program counter. Absolutely no kernel resource
are required. Since the kernel is not involved in scheduling these user-level threads,
switching among user-level threads are fast and efficient.
At Intermediate-level
The lightweight processes (LWPs) are located between the user-level threads and
kernel-level threads. These LWPs serve as a "Virtual CPUs" where user-threads can run.
Each task contains at least one LWp.
The user-level threads are multiplexed on the LWPs of the process.
Resource needs of LWP
An LWP contains a process control block (PCB) with register data, accounting
information and memory information. Therefore, switching between LWPs requires quite
a bit of work and LWPs are relatively slow as compared to user-level threads.
At kernel-level
The standard kernel-level threads execute all operations within the kernel. There
is a kernel-level thread for each LWP and there are some threads that run only on the
kernels behalf and have associated LWP. For example, a thread to service disk requests.
By request, a kernel-level thread can be pinned to a processor (CPU). See the rightmost
thread in figure. The kernel-level threads are scheduled by the kernel's scheduler and
user-level threads blocks.
SEE the diagram in NOTES
In modern solaris-2 a task no longer must block just because a kernel-level threads
blocks, the processor (CPU) is free to run another thread.
Resource needs of Kernel-level Thread
A kernel thread has only small data structure and stack. Switching between kernel
threads does not require changing memory access information and therefore, kernel -
level threads are relating fast and efficient.
Unit 2
CPU/Process Scheduling
The assignment of physical processors to processes allows processors to
accomplish work. The problem of determining when processors should be assigned and
to which processes is called processor scheduling or CPU scheduling.
When more than one process is runable, the operating system must decide which
one first. The part of the operating system concerned with this decision is called the
scheduler, and algorithm it uses is called the scheduling algorithm.
Goals of Scheduling (objectives)
In this section we try to answer following question: What the scheduler try to
achieve?
Many objectives must be considered in the design of a scheduling discipline. In
particular, a scheduler should consider fairness, efficiency, response time, turnaround
time, throughput, etc., Some of these goals depends on the system one is using for
example batch system, interactive system or real-time system, etc. but there are also some
goals that are desirable in all systems.
General Goals
Fairness
Fairness is important under all circumstances. A scheduler makes sure that each process
gets its fair share of the CPU and no process can suffer indefinite postponement. Note
that giving equivalent or equal time is not fair. Think of safety control and payroll at a
nuclear plant.
Policy Enforcement
The scheduler has to make sure that system's policy is enforced. For example, if the
local policy is safety then the safety control processes must be able to run whenever they
want to, even if it means delay in payroll processes.
Efficiency
Scheduler should keep the system (or in particular CPU) busy cent percent of the
time when possible. If the CPU and all the Input/Output devices can be kept running all
the time, more work gets done per second than if some components are idle.
Response Time
A scheduler should minimize the response time for interactive user.
Turnaround
A scheduler should minimize the time batch users must wait for an output.
Throughput
A scheduler should maximize the number of jobs processed per unit time.
A little thought will show that some of these goals are contradictory. It can be shown that
any scheduling algorithm that favors some class of jobs hurts another class of jobs. The
amount of CPU time available is finite, after all.
Preemptive Vs Nonpreemptive Scheduling
The Scheduling algorithms can be divided into two categories with respect to how they
deal with clock interrupts.
Nonpreemptive Scheduling
A scheduling discipline is nonpreemptive if, once a process has been given the CPU, the
CPU cannot be taken away from that process.
Following are some characteristics of nonpreemptive scheduling
In nonpreemptive system, short jobs are made to wait by longer jobs but the overall
treatment of all processes is fair.
In nonpreemptive system, response times are more predictable because incoming high
priority jobs can not displace waiting jobs.
In nonpreemptive scheduling, a schedular executes jobs in the following two situations.
When a process switches from running state to the waiting state.
When a process terminates.
Preemptive Scheduling
A scheduling discipline is preemptive if, once a process has been given the CPU can
taken away.
The strategy of allowing processes that are logically runable to be temporarily suspended
is called Preemptive Scheduling and it is contrast to the "run to completion" method.
CPU/Process Scheduling
The assignment of physical processors to processes allows processors to
accomplish work. The problem of determining when processors should be assigned and
to which processes is called processor scheduling or CPU scheduling.
When more than one process is runable, the operating system must decide which one
first. The part of the operating system concerned with this decision is called the
scheduler, and algorithm it uses is called the scheduling algorithm.
First-Come-First-Served (FCFS) Scheduling
Other names of this algorithm are:
First-In-First-Out (FIFO)
Run-to-Completion
Run-Until-Done
Perhaps, First-Come-First-Served algorithm is the simplest scheduling algorithm
is the simplest scheduling algorithm. Processes are dispatched according to their arrival
time on the ready queue. Being a nonpreemptive discipline, once a process has a CPU, it
runs to completion.
The FCFS scheduling is fair in the formal sense or human sense of fairness but it
is unfair in the sense that long jobs make short jobs wait and unimportant jobs make
important jobs wait.
FCFS is more predictable than most of other schemes since it offers time. FCFS
scheme is not useful in scheduling interactive users because it cannot guarantee good
response time. The code for FCFS scheduling is simple to write and understand. One of
the major drawback of this scheme is that the average time is often quite long.
The First-Come-First-Served algorithm is rarely used as a master scheme in modern
operating systems but it is often embedded within other schemes.
Round Robin Scheduling
One of the oldest, simplest, fairest and most widely used algorithm is round robin
(RR).In the round robin scheduling, processes are dispatched in a FIFO manner but are
given a limited amount of CPU time called a time-slice or a quantum.If a process does
not complete before its CPU-time expires, the CPU is preempted and given to the next
process waiting in a queue. The preempted process is then placed at the back of the ready
list.Round Robin Scheduling is preemptive (at the end of time-slice) therefore it is
effective in time-sharing environments in which the system needs to guarantee reasonable
response times for interactive users.
The only interesting issue with round robin scheme is the length of the quantum.
Setting the quantum too short causes too many context switches and lower the CPU
efficiency. On the other hand, setting the quantum too long may cause poor response time
and appoximates FCFS.In any event, the average waiting time under round robin
scheduling is often quite long.
Shortest-Job-First (SJF) Scheduling
Other name of this algorithm is Shortest-Process-Next (SPN).
Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or
process) with the smallest estimated run-time-to-completion is run next. In other words,
when CPU is available, it is assigned to the process that has smallest next CPU burst.
The SJF scheduling is especially appropriate for batch jobs for which the run
times are known in advance. Since the SJF scheduling algorithm gives the minimum
average time for a given set of processes, it is probably optimal.
The SJF algorithm favors short jobs (or processors) at the expense of longer ones.
The obvious problem with SJF scheme is that it requires precise knowledge of how long
a job or process will run, and this information is not usually available.The best SJF
algorithm can do is to rely on user estimates of run times.
In the production environment where the same jobs run regularly, it may be
possible to provide reasonable estimate of run time, based on the past performance of the
process. But in the development environment users rarely know how their program will
execute.Like FCFS, SJF is non preemptive therefore, it is not useful in timesharing
environment in which reasonable response time must be guaranteed.
Shortest-Job-First (SJF) Scheduling
Other name of this algorithm is Shortest-Process-Next (SPN).
Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or
process) with the smallest estimated run-time-to-completion is run next. In other words,
when CPU is available, it is assigned to the process that has smallest next CPU burst.
The SJF scheduling is especially appropriate for batch jobs for which the run times are
known in advance. Since the SJF scheduling algorithm gives the minimum average time
for a given set of processes, it is probably optimal.
The SJF algorithm favors short jobs (or processors) at the expense of longer ones.
The obvious problem with SJF scheme is that it requires precise knowledge of how long
a job or process will run, and this information is not usually available.
The best SJF algorithm can do is to rely on user estimates of run times.
In the production environment where the same jobs run regularly, it may be
possible to provide reasonable estimate of run time, based on the past performance of the
process. But in the development environment users rarely know how their program will
execute.Like FCFS, SJF is non preemptive therefore, it is not useful in timesharing
environment in which reasonable response time must be guaranteed.
Shortest-Remaining-Time (SRT) Scheduling
The SRT is the preemtive counterpart of SJF and useful in time-sharing
environment.
In SRT scheduling, the process with the smallest estimated run-time to
completion is run next, including new arrivals.
In SJF scheme, once a job begin executing, it run to completion.
In SJF scheme, a running process may be preempted by a new arrival process
with shortest estimated run-time.
The algorithm SRT has higher overhead than its counterpart SJF.
The SRT must keep track of the elapsed time of the running process and must
handle occasional preemptions.
In this scheme, arrival of small processes will run almost immediately. However,
longer jobs have even longer mean waiting time.
Priority Scheduling
The basic idea is straightforward: each process is assigned a priority, and priority
is allowed to run. Equal-Priority processes are scheduled in FCFS order. The shortest-
Job-First (SJF) algorithm is a special case of general priority scheduling algorithm.
An SJF algorithm is simply a priority algorithm where the priority is the inverse of the
(predicted) next CPU burst. That is, the longer the CPU burst, the lower the priority and
vice versa.
Priority can be defined either internally or externally. Internally defined priorities
use some measurable quantities or qualities to compute priority of a process.
Examples of Internal priorities are
Time limits.
Memory requirements.
File requirements,
for example, number of open files.
CPU Vs I/O requirements.
Externally defined priorities are set by criteria that are external to operating system such
as
The importance of process.
Type or amount of funds being paid for computer use.
The department sponsoring the work.
Politics.
Priority scheduling can be either preemptive or non preemptive
A preemptive priority algorithm will preemptive the CPU if the priority of the
newly arrival process is higher than the priority of the currently running process.
A non-preemptive priority algorithm will simply put the new process at the head
of the ready queue.
A major problem with priority scheduling is indefinite blocking or starvation. A
solution to the problem of indefinite blockage of the low-priority process is aging. Aging
is a technique of gradually increasing the priority of processes that wait in the system for
a long period of time.
Multilevel Queue Scheduling
A multilevel queue scheduling algorithm partitions the ready queue in several
separate queues.
In a multilevel queue scheduling processes are permanently assigned to one
queues.
The processes are permanently assigned to one another, based on some property
of the process, such as
Memory size
Process priority
Process type
Algorithm choose the process from the occupied queue that has the highest priority,
and run that process either
Preemptive or
Non-preemptively
Each queue has its own scheduling algorithm or policy.
Possibility I
If each queue has absolute priority over lower-priority queues then no process in the
queue could run unless the queue for the highest-priority processes were all empty.
For example, in the above figure no process in the batch queue could run unless th e
queues for system processes, interactive processes, and interactive editing processes will
all empty.
Possibility II
If there is a time slice between the queues then each queue gets a certain amount of
CPU times, which it can then schedule among the processes in its queue. For instance;
80% of the CPU time to foreground queue using RR.
20% of the CPU time to background queue using FCFS.
Since processes do not move between queue so, this policy has the advantage of low
scheduling overhead, but it is inflexible.
Multilevel Feedback Queue Scheduling
Multilevel feedback queue-scheduling algorithm allows a process to move between
queues. It uses many ready queues and associate a different priority with each queue.
The Algorithm chooses to process with highest priority from the occupied queue and run
that process either preemptively or unpreemptively. If the process uses too much CPU
time it will moved to a lower-priority queue. Similarly, a process that wait too long in the
lower-priority queue may be moved to a higher-priority queue may be moved to a
highest-priority queue. Note that this form of aging prevents starvation.
A process entering the ready queue is placed in queue 0.
If it does not finish within 8 milliseconds time, it is moved to the tail of queue 1.
If it does not complete, it is preempted and placed into queue 2.
Processes in queue 2 run on a FCFS basis, only when queue 2 run on a FCFS basis, only
when queue 0 and queue 1 are empty.
Deadlock
A set of process is in a deadlock state if each process in the set is waiting for an
event that can be caused by only another process in the set. In other words, each member
of the set of deadlock processes is waiting for a resource that can be released only by a
deadlock process. None of the processes can run, none of them can release any resources,
and none of them can be awakened. It is important to note that the number of processes
and the number and kind of resources possessed and requested are unimportant.
The resources may be either physical or logical. Examples of physical resources
are Printers, Tape Drivers, Memory Space, and CPU Cycles. Examples of logical
resources are Files, Semaphores, and Monitors.
The simplest example of deadlock is where process 1 has been allocated non -
shareable resources A, say, a tap drive, and process 2 has be allocated non-sharable
resource B, say, a printer. Now, if it turns out that process 1 needs resource B (printer) to
proceed and process 2 needs resource A (the tape drive) to proceed and these are the only
two processes in the system, each is blocked the other and all useful work in the system
stops. This situation ifs termed deadlock. The system is in deadlock state because each
process holds a resource being requested by the other process neither process is willing to
release the resource it holds.
Preemptable and Nonpreemptable Resources
Resources come in two flavors: preemptable and nonpreemptable. A preemptable
resource is one that can be taken away from the process with no ill effects. Memory is an
example of a preemptable resource. On the other hand, a nonpreemptable resource is one
that cannot be taken away from process (without causing ill effect). For example, CD
resources are not preemptable at an arbitrary moment.
Reallocating resources can resolve deadlocks that involve preemptable resources.
Deadlocks that involve nonpreemptable resources are difficult to deal with.
Necessary and Sufficient Deadlock Conditions
1. Mutual Exclusion Condition
The resources involved are non-shareable.
Explanation: At least one resource (thread) must be held in a non-shareable mode, that
is, only one process at a time claims exclusive control of the resource. If another process
requests that resource, the requesting process must be delayed until the resource has
been released.
2. Hold and Wait Condition
Requesting process hold already, resources while waiting for requested resources.
Explanation: There must exist a process that is holding a resource already allocated to
it while waiting for additional resource that are currently being held by other processes.
3. No-Preemptive Condition
Resources already allocated to a process cannot be preempted.
Explanation: Resources cannot be removed from the processes are used to completion or
released voluntarily by the process holding it.
4. Circular Wait Condition
The processes in the system form a circular list or chain where each process in the list is
waiting for a resource held by the next process in the list.
As an example, consider the traffic deadlock in the following figure
Consider each section of the street as a resource.
Mutual exclusion condition applies, since only one vehicle can be on a section of the
street at a time.
Hold-and-wait condition applies, since each vehicle is occupying a section of the street,
and waiting to move on to the next section of the street.
No-preemptive condition applies, since a section of the street that is a section of the street
that is occupied by a vehicle cannot be taken away from it.
Circular wait condition applies, since each vehicle is waiting on the next vehicle to move.
That is, each vehicle in the traffic is waiting for a section of street held by the next
vehicle in the traffic.
The simple rule to avoid traffic deadlock is that a vehicle should only enter an
intersection if it is assured that it will not have to stop inside the intersection.
It is not possible to have a deadlock involving only one single process. The
deadlock involves a circular “hold-and-wait” condition between two or more processes,
so “one” process cannot hold a resource, yet be waiting for another resource that it is
holding. In addition, deadlock is not possible between two threads in a process, because it
is the process that holds resources, not the thread that is, each thread has access to the
resources held by the process.
Deadlock Prevention
Elimination of “Mutual Exclusion” Condition
The mutual exclusion condition must hold for non-sharable resources. That is,
several processes cannot simultaneously share a single resource. This condition is
difficult to eliminate because some resources, such as the tap drive and printer, are
inherently non-shareable. Note that shareable resources like read-only-file do not require
mutually exclusive access and thus cannot be involved in deadlock.
Elimination of “Hold and Wait” Condition
There are two possibilities for elimination of the second condition. The first
alternative is that a process request be granted all of the resources it needs at once, prior
to execution. The second alternative is to disallow a process from requesting resources
whenever it has previously allocated resources. This strategy requires that all of the
resources a process will need must be requested at once. The system must grant resources
on “all or none” basis. If the complete set of resources needed by a process is not
currently available, then the process must wait until the complete set is available. While
the process waits, however, it may not hold any resources. Thus the “wait for” condition
is denied and deadlocks simply cannot occur. This strategy can lead to serious waste of
resources. For example, a program requiring ten tap drives must request and receive all
ten derives before it begins executing. If the program needs only one tap drive to begin
execution and then does not need the remaining tap drives for several hours. Then
substantial computer resources (9 tape drives) will sit idle for several hours. This strategy
can cause indefinite postponement (starvation). Since not all the required resources may
become available at once.
Elimination of “No-preemption” Condition
The nonpreemption condition can be alleviated by forcing a process waiting for a
resource that cannot immediately be allocated to relinquish all of its currently held
resources, so that other processes may use them to finish. Suppose a system does allow
processes to hold resources while requesting additional resources. Consider what happens
when a request cannot be satisfied. A process holds resources a second process may need
in order to proceed while second process may hold the resources needed by the first
process. This is a deadlock. This strategy require that when a process that is holding some
resources is denied a request for additional resources. The process must release its held
resources and, if necessary, request them again together with additional resources.
Implementation of this strategy denies the “no-preemptive” condition effectively.
High Cost When a process release resources the process may lose all its work to that
point. One serious consequence of this strategy is the possibility of indefinite
postponement (starvation). A process might be held off indefinitely as it repeatedly
requests and releases the same resources.
Elimination of “Circular Wait” Condition
The last condition, the circular wait, can be denied by imposing a total ordering
on all of the resource types and than forcing, all processes to request the resources in
order (increasing or decreasing). This strategy impose a total ordering of all resources
types, and to require that each process requests resources in a numerical order (increasing
or decreasing) of enumeration. With this rule, the resource allocation graph can never
have a cycle.
For example, provide a global numbering of all the resources, as shown
Now the rule is this: processes can request resources whenever they want to, but all
requests must be made in numerical order. A process may request first printer and then a
tape drive (order: 2, 4), but it may not request first a plotter and then a printer (order: 3,
2). The problem with this strategy is that it may be impossible to find an ordering that
satisfies everyone.
Deadlock Avoidance
This approach to the deadlock problem anticipates deadlock before it actually
occurs. This approach employs an algorithm to access the possibility that deadlock could
occur and acting accordingly. This method differs from deadlock prevention, which
guarantees that deadlock cannot occur by denying one of the necessary conditions of
deadlock.
If the necessary conditions for a deadlock are in place, it is still possible to avoid
deadlock by being careful when resources are allocated. Perhaps the most famous
deadlock avoidance algorithm, due to Dijkstra [1965], is the Banker’s algorithm. So
named because the process is analogous to that used by a banker in deciding if a loan can
be safely made.
In this analogy
Banker’s Algorithm
CustomersUsed Max
A 0 6
B 0 5
C 0 4
D 0 7
Available
Units = 10
Fig. 1
In the above figure, we see four customers each of whom has been granted a
number of credit nits. The banker reserved only 10 units rather than 22 units to servi ce
them. At certain moment, the situation becomes
CustomersUsed Max
A 1 6
B 1 5 Available
C 2 4 Units = 2
Safe State
D 4 7
The key to a state being safe is that there is at least one way for all users to finish.
In other analogy, the state of figure 2 is safe because with 2 units left, the banker can
delay any request except C's, thus letting C finish and release all four resources. With
four units in hand, the banker can let either D or B have the necessary units and so on.
Unsafe State
Consider what would happen if a request from B for one more unit were granted
in above
We would have following situation
CustomersUsed Max
A 1 6
B 2 5
C 2 4
D 4 7
Available
Units = 1
This is an unsafe state.
If all the customers namely A, B, C, and D asked for their maximum loans, then banker
could not satisfy any of them and we would have a deadlock.
Important Note:
It is important to note that an unsafe state does not imply the existence or even the
eventual existence a deadlock. What an unsafe state does imply is simply that
some unfortunate sequence of events might lead to a deadlock.
The Banker's algorithm is thus to consider each request as it occurs, and see if granting it
Deadlock Detection
Deadlock detection is the process of actually determining that a deadlock exists
and identifying the processes and resources involved in the deadlock.
The basic idea is to check allocation against resource availability for all possible
allocation sequences to determine if the system is in deadlocked state a. Of course, the
deadlock detection algorithm is only half of this strategy. Once a deadlock is detected,
there needs to be a way to recover several alternatives exists:
Temporarily prevent resources from deadlocked processes.
Back off a process to some check point allowing preemption of a needed resource and
restarting the process at the checkpoint later.
Successively kill processes until the system is deadlock free.
These methods are expensive in the sense that each iteration calls the detec tion algorithm
until the system proves to be deadlock free. The complexity of algorithm is O(N2) where
N is the number of proceeds. Another potential problem is starvation; same process killed
repeatedly.
File System Implementation
File-System Structure
File-System Implementation
Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
Recovery
Log-Structured File Systems
NFS
Example: WAFL File System
Objectives
To describe the details of implementing local file systems and directory structures
To describe the implementation of remote file systems
To discuss block allocation and free-block algorithms and trade-offs
File-System Structure
File structure
Logical storage unit
Collection of related information
File system resides on secondary storage (disks)
File system organized into layers
File control block – storage structure consisting of information about a file
Layered File System
A Typical File Control Block
The following figure illustrates the necessary file system structures provided by the
operating systems.
Virtual File Systems
Virtual File Systems (VFS) provide an object-oriented way of implementing file
systems. VFS allows the same system call interface (the API) to be used for different
types of file systems.
The API is to the VFS interface, rather than any specific type of file system.
Schematic View of Virtual File System
Directory Implementation
Linear list of file names with pointer to the data blocks.
simple to program
time-consuming to execute
Hash Table – linear list with hash data structure.
decreases directory search time
collisions – situations where two file names hash to the same location
fixed size
Allocation Methods
An allocation method refers to how disk blocks are allocated for files:
Contiguous allocation
Linked allocation
Indexed allocation
Contiguous Allocation
Each file occupies a set of contiguous blocks on the disk
Simple – only starting location (block #) and length (number of blocks) are
required
Random access
Wasteful of space (dynamic storage-allocation problem)
Files cannot grow
Mapping from logical to physical
Contiguous Allocation of Disk Space
Extent-Based Systems
Many newer file systems (I.e. Veritas File System) use a modified contiguous
allocation scheme
Extent-based file systems allocate disk blocks in extents
An extent is a contiguous block of disks
Extents are allocated for file allocation
A file consists of one or more extents.
Linked Allocation
Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk.
Simple – need only starting address
Free-space management system – no waste of space
No random access
Mapping
Indexed Allocation
Brings all pointers together into the index block.
Logical view.
Need index table
Random access
Dynamic access without external fragmentation, but have overhead of
index block.
Mapping from logical to physical in a file of maximum size of 256K words and block
size of 512 words. We need only 1 block for index table.
Mapping from logical to physical in a file of unbounded length (block size of 512 words).
Free-Space Management
Bit map requires extra space
Example:
block size = 212 bytes
disk size = 230 bytes (1 gigabyte)
n = 230/212 = 218 bits (or 32K bytes)
Easy to get contiguous files
Linked list (free list)
Cannot get contiguous space easily
No waste of space
Grouping
Counting
Free-Space Management
Need to protect:
Pointer to free list
Bit map
Must be kept on disk
Copy in memory and disk may differ
Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i]
= 0 on disk
Solution:
Set bit[i] = 1 in disk
Allocate block[i]
Set bit[i] = 1 in memory
Directory Implementation
Linear list of file names with pointer to the data blocks
simple to program
time-consuming to execute
Hash Table – linear list with hash data structure
decreases directory search time
collisions – situations where two file names hash to the same location
fixed size
Linked Free Space List on Disk
Efficiency and Performance
Efficiency dependent on:
disk allocation and directory algorithms
types of data kept in file’s directory entry
Performance
disk cache – separate section of main memory for frequently used blocks
free-behind and read-ahead – techniques to optimize sequential access
improve PC performance by dedicating section of memory as virtual disk, or RAM disk
Page Cache
A page cache caches pages rather than disk blocks using virtual memory techniques
Memory-mapped I/O uses a page cache
Routine I/O through the file system uses the buffer (disk) cache
This leads to the following figure
I/O Without a Unified Buffer Cache
Unified Buffer Cache
A unified buffer cache uses the same page cache to cache both memory-mapped
pages and ordinary file system I/O
I/O Using a Unified Buffer Cache
Recovery
Consistency checking – compares data in directory structure with data blocks on disk,
and tries to fix inconsistencies
Use system programs to back up data from disk to another storage device (floppy
disk, magnetic tape, other magnetic disk, optical)
Recover lost file or disk by restoring data from backup
Log Structured File Systems
Log structured (or journaling) file systems record each update to the file system
as a transaction
All transactions are written to a log
A transaction is considered committed once it is written to the log
However, the file system may not yet be updated
The transactions in the log are asynchronously written to the file system
When the file system is modified, the transaction is removed from the log
If the file system crashes, all remaining transactions in the log must still be p erformed
The Sun Network File System (NFS)
An implementation and a specification of a software system for accessing remote
files across LANs (or WANs).The implementation is part of the Solaris and SunOS
operating systems running on Sun workstations using an unreliable datagram protocol
(UDP/IP protocol and EthernetInterconnected workstations viewed as a set of
independent machines with independent file systems, which allows sharing among these
file systems in a transparent manner.
A remote directory is mounted over a local file system directory The mounted
directory looks like an integral subtree of the local file system, replacing the subtree
descending from the local directory Specification of the remote directory for the mount
operation is nontransparent; the host name of the remote directory has to be provided
Files in the remote directory can then be accessed in a transparent manner Subject to
access-rights accreditation, potentially any file system (or directory within a file system),
can be mounted remotely on top of any local directory NFS is designed to operate in a
heterogeneous environment of different machines, operating systems, and network
architectures; the NFS specifications independent of these media.
This independence is achieved through the use of RPC primitives built on top of
an External Data Representation (XDR) protocol used between two implementation -
independent interfaces
The NFS specification distinguishes between the services provided by a mount
mechanism and the actual remote-file-access services
NFS Mount Protocol
Establishes initial logical connection between server and client
Mount operation includes name of remote directory to be mounted and name of server
machine storing it
Mount request is mapped to corresponding RPC and forwarded to mount server running
on server machine
Export list – specifies local file systems that server exports for mounting, along with
names of machines that are permitted to mount them
Following a mount request that conforms to its export list, the server returns a file
handle—a key for further accesses
File handle – a file-system identifier, and an inode number to identify the mounted
directory within the exported file system
The mount operation changes only the user’s view and does not affect the server side
NFS Protocol
Provides a set of remote procedure calls for remote file operations. The procedures
support the following operations:
searching for a file within a directory
reading a set of directory entries
manipulating links and directories
accessing file attributes
reading and writing files
NFS servers are stateless; each request has to provide a full set of arguments
(NFS V4 is just coming available – very different, stateful)
Modified data must be committed to the server’s disk before results are returned to the
client (lose advantages of caching)
The NFS protocol does not provide concurrency-control mechanisms
Three Major Layers of NFS Architecture
UNIX file-system interface (based on the open, read, write, and close calls, and file
descriptors)
Virtual File System (VFS) layer – distinguishes local files from remote ones, and local
files are further distinguished according to their file-system types
The VFS activates file-system-specific operations to handle local requests according to
their file-system types
Calls the NFS protocol procedures for remote requests
NFS service layer – bottom layer of the architecture
Implements the NFS protocol
Performed by breaking the path into component names and performing a separate NFS
lookup call for every pair of component name and directory vnode
To make lookup faster, a directory name lookup cache on the client’s side holds the
vnodes for remote directory names
NFS Remote Operations
Nearly one-to-one correspondence between regular UNIX system calls and the
NFS protocol RPCs (except opening and closing files)
NFS adheres to the remote-service paradigm, but employs buffering and caching
techniques for the sake of performance
File-blocks cache – when a file is opened, the kernel checks with the remote server
whether to fetch or revalidate the cached attributes
Cached file blocks are used only if the corresponding cached attributes are up to date
File-attribute cache – the attribute cache is updated whenever new attributes arrive from
the server
Clients do not free delayed-write blocks until the server confirms that the data have been
written to disk
Example: WAFL File System
Used on Network Appliance “Filers” – distributed file system appliances
“Write-anywhere file layout”
Serves up NFS, CIFS, http, ftp
Random I/O optimized, write optimized
NVRAM for write caching
Similar to Berkeley Fast File System, with extensive modifications
File-System Interface
File Concept
Access Methods
Directory Structure
File-System Mounting
File Sharing
Protection
Objectives
To explain the function of file systems
To describe the interfaces to file systems
To discuss file-system design tradeoffs, including access methods, file
sharing, file locking, and directory structures
To explore file-system protection
File Concept
Contiguous logical address space
Types:
Data
numeric
character
binary
Program
File Structure
None - sequence of words, bytes
Simple record structure
Lines
Fixed length
Variable length
Complex Structures
Formatted document
Relocatable load file
Can simulate last two with first method by inserting appropriate control characters
Who decides:
Operating system
Program
File Attributes
Name – only information kept in human-readable form
Identifier – unique tag (number) identifies file within file system
Type – needed for systems that support different types
Location – pointer to file location on device
Size – current file size
Protection – controls who can do reading, writing, executing
Time, date, and user identification – data for protection, security, and usage
monitoring
Information about files are kept in the directory structure, which is maintained on the disk
File is an abstract data type
Create
Write
Read
Reposition within file
Delete
Truncate
File Operations
Open(Fi) – search the directory structure on disk for entry Fi, and move the content of
entry to memory
Close (Fi) – move the content of entry Fi in memory to directory structure on disk
Open Files
Several pieces of data are needed to manage open files:
File pointer: pointer to last read/write location, per process that has the file open
File-open count: counter of number of times a file is open – to allow removal of data
from open-file table when last processes closes it
Disk location of the file: cache of data access information
Access rights: per-process access mode information
Open File Locking
Provided by some operating systems and file systems
Mediates access to a file
Mandatory or advisory:
Mandatory – access is denied depending on locks held and requested
Advisory – processes can find status of locks and decide what to do
File Locking Example – Java API
import java.io.*;
import java.nio.channels.*;
public class LockingExample {
public static final boolean EXCLUSIVE = false;
public static final boolean SHARED = true;
public static void main(String arsg[]) throws IOException {
FileLock sharedLock = null;
FileLock exclusiveLock = null;
try {
RandomAccessFile raf = new RandomAccessFile("file.txt", "rw");
// get the channel for the file
FileChannel ch = raf.getChannel();
// this locks the first half of the file - exclusive
exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE);
/** Now modify the data . . . */
// release the lock
exclusiveLock.release();
SHARED);
// this locks the second half of the file - shared
sharedLock = ch.lock(raf.length()/2+1, raf.length(),
/** Now read the data . . . */
// release the lock
exclusiveLock.release();
} catch (java.io.IOException ioe) {
System.err.println(ioe);
}finally {
if (exclusiveLock != null)
exclusiveLock.release();
if (sharedLock != null)
sharedLock.release();
}
}
}
File Types – Name, Extension
Access Methods
Sequential Access
read next
write next
reset
no read after last write
(rewrite)
Direct Access
read n
write n
position to n
read next
write next
rewrite n
n = relative block number
Sequential-access File
Simulation of Sequential Access on a Direct-access File
Example of Index and Relative Files
Directory Structure
A collection of nodes containing information about all files
A Typical File-system Organization
Operations Performed on Directory
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
Organize the Directory (Logically) to Obtain
Efficiency – locating a file quickly
Naming – convenient to users
Two users can have same name for different files
The same file can have several different names
Grouping – logical grouping of files by properties, (e.g., all Java programs, all
games, …)
A single directory for all users
Single-Level Directory
Separate directory for each user
Tree-Structured Directories
Two-Level Directory
Efficient searching
Grouping Capability
Current directory (working directory)
Absolute or relative path name
Creating a new file is done in current directory
Delete a file
rm <file-name>
Creating a new subdirectory is done in current directory
mkdir <dir-name>
Example: if in current directory /mail
mkdir count
Acyclic-Graph Directories
Have shared subdirectories and files
Two different names (aliasing)
If dict deletes list dangling pointer
Solutions:
Backpointers, so we can delete all pointers
Variable size records a problem
Backpointers using a daisy chain organization
Entry-hold-count solution
New directory entry type
Link – another name (pointer) to an existing file
Resolve the link – follow pointer to locate the file
General Graph Directory
General Graph Directory (Cont.)
How do we guarantee no cycles?
Allow only links to file not subdirectories
Garbage collection
Every time a new link is added use a cycle detection
algorithm to determine whether it is OK
File System Mounting
A file system must be mounted before it can be accessed
A unmounted file system (i.e. Fig. 11-11(b)) is mounted at a mount point
(a) Existing. (b) Unmounted Partition Mount Point
File Sharing
Sharing of files on multi-user systems is desirable
Sharing may be done through a protection scheme
On distributed systems, files may be shared across a network
Network File System (NFS) is a common distributed file-sharing method
File Sharing – Multiple Users
User IDs identify users, allowing permissions and protections to be per-user
Group IDs allow users to be in groups, permitting group access rights
File Sharing – Remote File Systems
Uses networking to allow file system access between systems
Manually via programs like FTP
Automatically, seamlessly using distributed file systems
Semi automatically via the world wide web
Client-server model allows clients to mount remote file systems from servers
Server can serve multiple clients
Client and user-on-client identification is insecure or complicated
NFS is standard UNIX client-server file sharing protocol
CIFS is standard Windows protocol
Standard operating system file calls are translated into remote calls
Distributed Information Systems (distributed naming services) such as LDAP, DNS,
NIS, Active Directory implement unified access to information needed for remote
computing
File Sharing – Failure Modes
Remote file systems add new failure modes, due to network failure, server failure
Recovery from failure can involve state information about status of each remote request
Stateless protocols such as NFS include all information in each request, allowing easy
recovery but less security
File Sharing – Consistency Semantics
Consistency semantics specify how multiple users are to access a shared file
simultaneously Similar to process synchronization algorithms.
Tend to be less complex due to disk I/O and network latency (for remote file systems
Andrew File System (AFS) implemented complex remote file sharing semantics
Unix file system (UFS) implements:
Writes to an open file visible immediately to other users of the same open file
Sharing file pointer to allow multiple users to read and write concurrently
AFS has session semantics
Writes only visible to sessions starting after the file is closed
Protection
File owner/creator should be able to control:
Types of access
Read
Write
Execute
Append
Delete
List
Access Lists and Groups
Mode of access: read, write, execute
Three classes of users
RWX
a) owner access 7 1 1 1
RWX
b) group access 6 1 1 0
RWX
c) public access 1 0 0 1
Ask manager to create a group (unique name), say G, and add some users to the group.
For a particular file (say game) or subdirectory, define an appropriate access.
Mass-Storage Systems
Overview of Mass Storage Structure
Disk Structure
Disk Attachment
Disk Scheduling
Disk Management
Swap-Space Management
RAID Structure
Disk Attachment
Stable-Storage Implementation
Tertiary Storage Devices
Operating System Issues
Performance Issues
Objectives
Describe the physical structure of secondary and tertiary storage devices and
the resulting effects on the uses of the devices
Explain the performance characteristics of mass-storage devices
Discuss operating-system services provided for mass storage, including RAID and HSM
Overview of Mass Storage Structure
Magnetic disks provide bulk of secondary storage of modern computers
Drives rotate at 60 to 200 times per second
Transfer rate is rate at which data flow between drive and computer
Positioning time (random-access time) is time to move disk arm to desired cylinder
(seek time) and time for desired sector to rotate under the disk head (rotational latency)
Head crash results from disk head making contact with the disk surface
That’s bad
Disks can be removable
Drive attached to computer via I/O bus
Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI
Host controller in computer uses bus to talk to disk controller built into drive or storage
array
Moving-head Disk Mechanism
Magnetic tape
Was early secondary-storage medium
Relatively permanent and holds large quantities of data
Access time slow
Random access ~1000 times slower than disk
Mainly used for backup, storage of infrequently-used data, transfer medium between
systems
Kept in spool and wound or rewound past read-write head
Once data under head, transfer rates comparable to disk
20-200GB typical storage
Common technologies are 4mm, 8mm, 19mm, LTO-2 and SDLT
Disk Structure
Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the
logical block is the smallest unit of transfer.
The 1-dimensional array of logical blocks is mapped into the sectors of the disk
sequentially.
Sector 0 is the first sector of the first track on the outermost cylinder.
Mapping proceeds in order through that track, then the rest of the tracks in that cylinder,
and then through the rest of the cylinders from outermost to innermost.
Disk Attachment
Host-attached storage accessed through I/O ports talking to I/O busses
SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator requests operation and
SCSI targets perform tasks
Each target can have up to 8 logical units (disks attached to device controller
FC is high-speed serial architecture
Can be switched fabric with 24-bit address space – the basis of storage area networks
(SANs) in which many hosts attach to many storage units
Can be arbitrated loop (FC-AL) of 126 devices
Network-Attached Storage
Network-attached storage (NAS) is storage made available over a network rather than
over a local connection (such as a bus)
NFS and CIFS are common protocols
Implemented via remote procedure calls (RPCs) between host and storage
New iSCSI protocol uses IP network to carry the SCSI protocol
Storage Area Network
Common in large storage environments (and becoming more common)
Multiple hosts attached to multiple storage arrays - flexible
Disk Scheduling
The operating system is responsible for using hardware efficiently — for the disk drives,
this means having a fast access time and disk bandwidth.
Access time has two major components
Seek time is the time for the disk are to move the heads to the cylinder containing the
desired sector.
Rotational latency is the additional time waiting for the disk to rotate the desired sector to
the disk head.
Minimize seek time
Seek time seek distance
Disk bandwidth is the total number of bytes transferred, divided by the total time between
the first request for service and the completion of the last transfer.
Several algorithms exist to schedule the servicing of disk I/O requests.
We illustrate them with a request queue (0-199).
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
FCFS
SSTF
Selects the request with the minimum seek time from the current head position.
SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests.
Illustration shows total head movement of 236 cylinders.
SCAN
The disk arm starts at one end of the disk, and moves toward the other end, servicing
requests until it gets to the other end of the disk, where the head movement is reversed
and servicing continues.
Sometimes called the elevator algorithm.
Illustration shows total head movement of 208 cylinders.
C-SCAN
Provides a more uniform wait time than SCAN.
The head moves from one end of the disk to the other. servicing requests as it goes.
When it reaches the other end, however, it immediately returns to the beginning of the
disk, without servicing any requests on the return trip.
Treats the cylinders as a circular list that wraps around from the last cylinder to the first
one.
C-LOOK
Version of C-SCAN
Arm only goes as far as the last request in each direction, then reverses direction
immediately, without first going all the way to the end of the disk.
Selecting a Disk-Scheduling Algorithm
SSTF is common and has a natural appeal
SCAN and C-SCAN perform better for systems that place a heavy load on the disk.
Performance depends on the number and types of requests.
Requests for disk service can be influenced by the file-allocation method.
The disk-scheduling algorithm should be written as a separate module of the operating
system, allowing it to be replaced with a different algorithm if necessary.
Either SSTF or LOOK is a reasonable choice for the default algorithm.
Disk Management
Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk
controller can read and write.
To use a disk to hold files, the operating system still needs to record its own data
structures on the disk.
Partition the disk into one or more groups of cylinders.
Logical formatting or “making a file system”.
Boot block initializes system.
The bootstrap is stored in ROM.
Bootstrap loader program.
Methods such as sector sparing used to handle bad blocks.
Booting from a Disk in Windows 2000
Swap-Space Management
Swap-space — Virtual memory uses disk space as an extension of main memory.
Swap-space can be carved out of the normal file system,or, more commonly, it can be
in a separate disk partition.
Swap-space management
4.3BSD allocates swap space when process starts; holds text segment (the program) and
data segment.
Kernel uses swap maps to track swap-space use.
Solaris 2 allocates swap space only when a page is forced out of physical memory,
not when the virtual memory page is first created.
RAID Structure
Data Structures for Swapping on Linux Systems
RAID – multiple disk drives provides reliability via redundancy.
RAID is arranged into six different levels.
RAID (cont)
Several improvements in disk-use techniques involve the use of multiple disks working
cooperatively.
Disk striping uses a group of disks as one storage unit.
RAID schemes improve performance and improve the reliability of the storage system
by storing redundant data.
Mirroring or shadowing keeps duplicate of each disk.
Block interleaved parity uses much less redundancy.
RAID Levels
RAID (0 + 1) and (1 + 0)
Stable-Storage Implementation
Write-ahead log scheme requires stable storage.
To implement stable storage:
Replicate information on more than one nonvolatile storage media with independent
failure modes.
Update information in a controlled manner to ensure that we can recover the stable data
after any failure during data transfer or recovery.
Tertiary Storage Devices
Low cost is the defining characteristic of tertiary storage.
Generally, tertiary storage is built using removable media
Common examples of removable media are floppy disks and CD-ROMs; other types
are available.
Removable Disks
Floppy disk — thin flexible disk coated with magnetic material, enclosed in a protective
plastic case.
Most floppies hold about 1 MB; similar technology is used for removable disks that hold
more than 1 GB.
Removable magnetic disks can be nearly as fast as hard disks, but they are at a
greater risk of damage from exposure.
Removable Disks (Cont.)
A magneto-optic disk records data on a rigid platter coated with magnetic
material. Laser heat is used to amplify a large, weak magnetic field to record a bit.
Laser light is also used to read data (Kerr effect).
The magneto-optic head flies much farther from the disk surface than a magnetic disk
head, and the magnetic material is covered with a protective layer of plastic or glass;
resistant to head crashes.
Optical disks do not use magnetism; they employ special materials that are alte red
by laser light.
WORM Disks
The data on read-write disks can be modified over and over.
WORM (“Write Once, Read Many Times”) disks can be written only once.
Thin aluminum film sandwiched between two glass or plastic platters.
To write a bit, the drive uses a laser light to burn a small hole through the aluminum;
information can be destroyed by not altered.
Very durable and reliable.
Read Only disks, such ad CD-ROM and DVD, com from the factory with the data pre-
recorded.
Tapes
Compared to a disk, a tape is less expensive and holds more data, but random access is
much slower.
Tape is an economical medium for purposes that do not require fast random access, e.g.,
backup copies of disk data, holding huge volumes of data.
Large tape installations typically use robotic tape changers that move tapes between
tape drives and storage slots in a tape library.
stacker – library that holds a few tapes
silo – library that holds thousands of tapes
A disk-resident file can be archived to tape for low cost storage; the computer can stage it
back into disk storage for active use.
Operating System Issues
Major OS jobs are to manage physical devices and to present a virtual machine
abstraction to applications
For hard disks, the OS provides two abstraction:
Raw device – an array of data blocks.
File system – the OS queues and schedules the interleaved requests from several
applications.
Application Interface
Most OSs handle removable disks almost exactly like fixed disks — a new cartridge is
formatted and an empty file system is generated on the disk.
Tapes are presented as a raw storage medium, i.e., and application does not not open a
file on the tape, it opens the whole tape drive as a raw device.
Usually the tape drive is reserved for the exclusive use of that application.
Since the OS does not provide file system services, the application must decide how to
use the array of blocks.
Since every application makes up its own rules for how to organize a tape, a tape full of
data can generally only be used by the program that created it.
Tape Drives
The basic operations for a tape drive differ from those of a disk drive.
locate positions the tape to a specific logical block, not an entire track (corresponds to
seek).
The read position operation returns the logical block number where the tape head is.
The space operation enables relative motion.
Tape drives are “append-only” devices; updating a block in the middle of the tape also
effectively erases everything beyond that block.
An EOT mark is placed after a block that is written.
File Naming
The issue of naming files on removable media is especially difficult when we want to
write data on a removable cartridge on one computer, and then use the cartridge in
another computer.
Contemporary OSs generally leave the name space problem unsolved for removable
media, and depend on applications and users to figure out how to access and interpret the
data.
Some kinds of removable media (e.g., CDs) are so well standardized that all
computers use them the same way.
Hierarchical Storage Management (HSM)
A hierarchical storage system extends the storage hierarchy beyond primary memory
and secondary storage to incorporate tertiary storage — usually implemented as a
jukebox of tapes or removable disks.
Usually incorporate tertiary storage by extending the file
system. Small and frequently used files remain on disk.
Large, old, inactive files are archived to the jukebox.
HSM is usually found in supercomputing centers and other large installations that have
enormous volumes of data.
Speed
Two aspects of speed in tertiary storage are bandwidth and latency.
Bandwidth is measured in bytes per second.
Sustained bandwidth – average data rate during a large transfer; # of bytes/transfer
time. Data rate when the data stream is actually flowing.
Effective bandwidth – average over the entire I/O time, including seek or locate, and
cartridge switching.
Drive’s overall data rate.
Access latency – amount of time needed to locate data.
Access time for a disk – move the arm to the selected cylinder and wait for the rotational
latency; < 35 milliseconds.
Access on tape requires winding the tape reels until the selected block reaches the tape
head; tens or hundreds of seconds.
Generally say that random access within a tape cartridge is about a thousand times
slower than random access on disk.
The low cost of tertiary storage is a result of having many cheap cartridges share a
few expensive drives.
A removable library is best devoted to the storage of infrequently used data, because
the library can only satisfy a relatively small number of I/O requests per hour.
Reliability
A fixed disk drive is likely to be more reliable than a removable disk or tape drive.
An optical cartridge is likely to be more reliable than a magnetic disk or tape.
A head crash in a fixed hard disk generally destroys the data, whereas the failure of a
tape drive or optical disk drive often leaves the data cartridge unharmed.
Cost
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only
one tape is used per drive.
The cheapest tape drives and the cheapest disk drives have had about the same storage
capacity over the years.
Tertiary storage gives a cost savings only when the number of cartridges is considerably
larger than the number of drives.
Price per Megabyte of DRAM, From 1981 to 2004
Price per Megabyte of Magnetic Hard Disk, From 1981 to 2004
Price per Megabyte of a Tape Drive, From 1984-2000
I/O Hardware
Application I/O Interface
Kernel I/O Subsystem
I/O Systems
Transforming I/O Requests to Hardware Operations
Streams
Performance
Objectives
Explore the structure of an operating system’s I/O subsystem
Discuss the principles of I/O hardware and its complexity
Provide details of the performance aspects of I/O hardware and software
I/O Hardware
A Typical PC Bus Structure
Device I/O Port Locations on PCs (partial)
Polling
Determines state of device
command-ready
busy
Error
Busy-wait cycle to wait for I/O from device
Interrupts
CPU Interrupt-request line triggered by I/O device
Interrupt handler receives interrupts
Maskable to ignore or delay some interrupts
Interrupt vector to dispatch interrupt to correct handler
Based on priority
Some nonmaskable
Interrupt mechanism also used for exceptions
Interrupt-Driven I/O Cycle
Intel Pentium Processor Event-Vector Table
Direct Memory Access
Used to avoid programmed I/O for large data movement
Requires DMA controller
Bypasses CPU to transfer data directly between I/O device and memory
Six Step Process to Perform DMA Transfer
Application I/O Interface
I/O system calls encapsulate device behaviors in generic classes
Device-driver layer hides differences among I/O controllers from kernel
Devices vary in many dimensions
Character-stream or block
Sequential or random-access
Sharable or dedicated
Speed of operation
read-write, read only, or write only
A Kernel I/O Structure
Characteristics of I/O Devices
Block and Character Devices
Block devices include disk drives
Commands include read, write, seek
Raw I/O or file-system access
Memory-mapped file access possible
Character devices include keyboards, mice, serial ports
Commands include get, put
Libraries layered on top allow line editing
Network Devices
Varying enough from block and character to have own interface
Unix and Windows NT/9x/2000 include socket interface
Separates network protocol from network operation
Includes select functionality
Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes)
Clocks and Timers
Provide current time, elapsed time, timer
Programmable interval timer used for timings, periodic interrupts
ioctl (on UNIX) covers odd aspects of I/O such as clocks and timers
Blocking and Nonblocking I/O
Blocking - process suspended until I/O completed
Easy to use and understand
Insufficient for some needs
Nonblocking - I/O call returns as much as available
User interface, data copy (buffered I/O)
Implemented via multi-threading
Returns quickly with count of bytes read or written
Asynchronous - process runs while I/O executes
Difficult to use
I/O subsystem signals process when I/O completed
Two I/O Methods
Kernel I/O Subsystem
Scheduling
Some I/O request ordering via per-device queue
Some OSs try fairness
Buffering - store data in memory while transferring between devices
To cope with device speed mismatch
To cope with device transfer size mismatch
To maintain “copy semantics”
Device-status Table
Sun Enterprise 6000 Device-Transfer Rates
Kernel I/O Subsystem
Caching - fast memory holding copy of data
Always just a copy
Key to performance
Spooling - hold output for a device
If device can serve only one request at a time
i.e., Printing
Device reservation - provides exclusive access to a device
System calls for allocation and deallocation
Watch out for deadlock
Error Handling
OS can recover from disk read, device unavailable, transient write failures
Most return an error number or code when I/O request fails
System error logs hold problem reports
I/O Protection
User process may accidentally or purposefully attempt to disrupt normal operation
via illegal I/O instructions
All I/O instructions defined to be privileged
I/O must be performed via system calls
Memory-mapped and I/O port memory locations must be protected too
Use of a System Call to Perform I/O
Kernel Data Structures
Kernel keeps state info for I/O components, including open file tables, network
connections, character device state
Many, many complex data structures to track buffers, memory allocation, “dirty” blocks
Some use object-oriented methods and message passing to implement I/O
UNIX I/O Kernel Structure
I/O Requests to Hardware Operations
Consider reading a file from disk for a process:
Determine device holding file
Translate name to device representation
Physically read data from disk into buffer
Make data available to requesting process
Return control to process
Life Cycle of An I/O Request
STREAMS
STREAM – a full-duplex communication channel between a user-level process and a
device in Unix System V and beyond
A STREAM consists of:
- STREAM head interfaces with the user process
- driver end interfaces with the device
- zero or more STREAM modules between them.
Each module contains a read queue and a write queue
Message passing is used to communicate between queues
The STREAMS Structure
Performance
I/O a major factor in system performance:
Demands CPU to execute device driver, kernel I/O code
Context switches due to interrupts
Data copying
Network traffic especially stressful
Intercomputer Communications
Improving Performance
Reduce number of context switches
Reduce data copying
Reduce interrupts by using large transfers, smart controllers, polling
Use DMA
Balance CPU, memory, bus, and I/O performance for highest throughput
Device-Functionality Progression
***The End***