slides for chapter 6: operating system support from coulouris, dollimore and kindberg distributed...

Slides for Chapter 6: Operating System support

From Coulouris, Dollimore and Kindberg

Distributed Systems: Concepts and Design

Edition 3, © Addison-Wesley 2001

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3

© Addison-Wesley Publishers 2000

Outline

IntroductionThe operation system layerProtectionProcesses and threadsCommunication and invocationOperating system architectureSummary



6.1 Introduction

In this chapter we shall continue to focus on remote invocations without real-time guarantee

An important theme of the chapter is the role of the system kernel

The chapter aims to give the reader an understanding of the advantages and disadvantages of splitting functionality between protection domains (kernel and user-level code)

We shall examining the relation between operation system layer and middle layer, and in particular how well the requirement of middleware can be met by the operating system Efficient and robust access to physical resources The flexibility to implement a variety of resource-management policies



Introduction (2)

The task of any operating system is to provide problem-oriented abstractions of the underlying

physical resources (For example, sockets rather than raw network access) the processors Memory Communications storage media

System call interface takes over the physical resources on a single node and manages them to present these resource abstractions



Introduction (3)

Network operating systems They have a network capability built into them and so can

be used to access remote resources. Access is network-transparent for some – not all – type of resource.

Multiple system images The node running a network operating system retain autonomy in

managing their own processing resources

Single system image One could envisage an operating system in which users

are never concerned with where their programs run, or the location of any resources. The operating system has control over all the nodes in the system

An operating system that produces a single system image like this for all the resources in a distributed system is called a distributed operating system



Introduction (4) --Middleware and network operating systems

In fact, there are no distributed operating systems in general use, only network operating systems The first reason, users have much invested in their

application software, which often meets their current problem-solving needs

The second reason against the adoption of distributed operating systems is that users tend to prefer to have a degree of autonomy for their machines, even is a closely knit organization

The combination of middleware and network operating systems provides an acceptable balance between the requirement for autonomy



Figure 6.1System layers

Applications, services

Computer &

Platform

Middleware

OS: kernel,libraries & servers

network hardware

OS1

Computer & network hardware

Node 1 Node 2

Processes, threads,communication, ...

OS2Processes, threads,communication, ...



The operating system layer

Our goal in this chapter is to examine the impact of particular OS mechanisms on middleware’s ability to deliver distributed resource sharing to users

Kernels and server processes are the components that manage resources and present clients with an interface to the resources Encapsulation Protection Concurrent processing Communication Scheduling

Provide a useful service interface to their resource



Figure 6.2Core OS functionality

Communication

manager

Thread manager Memory manager

Supervisor

Process manager

Handles the creation of and operations upon

process

Tread creation, synchronization and scheduling

Communication between threads

attached to different processes on the same

computerManagement of physical and virtual memory

Dispatching of interrupts, system call

traps and other exceptions



6.3 Protection

We said above that resources require protection from illegitimate accesses. Note that the threat to a system’s integrity does not come only from maliciously contrived code. Benign code that contains a bug or which has unanticipated behavior may cause part of the rest of the system to behave incorrectly.

Protecting the file consists of two sub-problem The first is to ensure that each of the file’s two operations (read and read and

writewrite) can be performed only by clients with right to perform it The other type of illegitimate access, which we shall address here, is

where a misbehaving client sidesteps the operations that resource exports

We can protect resource from illegitimate invocations such as setFilePointRandomly or to use a type-safe programming language (JAVA or Modula-3)

this is a meaningless operation that would upset normal use of

the file and that files would never be designed to export



Kernel and Protection

The kernel is a program that is distinguished by the facts that it always runs and its code is executed with complete access privileged for the physical resources on its host computer

A kernel process execute with the processor in supervisorsupervisor (privileged) mode; the kernel arranges that other processes execute in useruser (unprivileged) mode

A kernel also sets up address spaces to protect itself and other processes from the accesses of an aberrant process, and to provide processes with their required virtual memory layout

The process can safely transfer from a user-level address space to the kernel’s address space via an exception such as an interrupt or a system call trap



6.4 Processes and threads

A thread is the operating system abstraction of an activity (the term derives from the phrase “thread of execution”)

An execution environment is the unit of resource management: a collection of local kernel-managed resources to which its threads have access

An execution environment primarily consists An address space Thread synchronization and communication resources

such as semaphore and communication interfaces High-level resources such as open file and windows



6.4.1 Address spaces

Region, separated by inaccessible areas of virtual memory

Region do not overlapEach region is specified by the following properties

Its extent (lowest virtual address and size) Read/write/execute permissions for the process’s threads Whether it can be grown upwards or downward



Figure 6.3Address space

Stack

Text

Heap

Auxiliaryregions

0

2N



6.4.1 Address spaces (2)

A mapped file is one that is accessed as an array of bytes in memory. The virtual memory system ensures that accesses made in memory are reflected in the underlying file storage

A shared memory region is that is backed by the same physical memory as one or more regions belonging to other address spaces

The uses of shared regions include the following Libraries Kernel Data sharing and communication



6.4.2 Creation of a new process

The creation of a new process has been an indivisible operation provided by the operating system. For example, the UNIX fork system call.

For a distributed system, the design of the process creation mechanism has to take account of the utilization of multiple computers

The choice of a new process can be separated into two independent aspects The choice of a target host The creation of an execution environment



Choice of process host

The choice of node at which the new process will reside – the process allocation decision – is a matter of policy

Transfer policy Determines whether to situate a new process locally or

remotely. For example, on whether the local node is lightly or heavily load

Location policy Determines which node should host a new process

selected for transfer. This decision may depend on the relative loads of nodes, on their machine architectures and on any specialized resources they may process



Choice of process host (2)

Process location policies may be Static Adaptive

Load-sharing systems may be Centralized Hierarchical decentralized

Load manager collect information about the nodes and use it to allocate new processes to node

One load manager component

Several load manager organized in a tree structure

Node exchange information with one another direct to make allocation decisions



Choice of process host (3)

In sender-initiated load-sharing algorithms, the node that requires a new process to be created is responsible for initiating the transfer decision

In receiver-initiated algorithm, a node whose load is below a given threshold advertises its existence to other nodes so that relatively loaded nodes will transfer work to it

Migratory load-sharing systems can shift load at any time, not just when a new process is created. They use a mechanism called process migration



Creation of a new execution environment

There are two approaches to defining and initializing the address space of a newly created process Where the address space is of statically defined format

For example, it could contain just a program text region, heap region and stack region

Address space regions are initialized from an executable file or filled with zeroes as appropriate

The address space can be defined with respect to an existing execution environment

For example the newly created child process physically shares the parent’s text region, and has heap and stack regions that are copies of the parent’s in extent (as well as in initial contents)

When parent and child share a region, the page frames belonging to the parent’s region are mapped simultaneously into the corresponding child region



Figure 6.4Copy-on-write

a) Before write b) After write

Sharedframe

A's pagetable

B's pagetable

Process A’s address space Process B’s address space

Kernel

RA RB

RB copiedfrom RA

The pages are initially write-protected at the hardware level

page faultpage fault

The page fault handler

allocates a new frame for

process B and copies the

original frame’s data into byte by

byte



6.4.3 Threads

thread 是 process 的簡化型式，它包含了使用 CPU 所必須的資訊： Program Counter 、 register set 以及 stack space 。同一個程式 (task) 的 thread 之間共享 code section 、 data section 以及作業系統的資源 (OS resource) 。如果作業系統可以提供多個 thread 同時執行的能力，其具有 multithreading 的能力

The next key aspect of a process to consider in more detail and server process to possess more than one thread.



Figure 6.5Client and server with threads

Server

N threads

Input-output

Client

Thread 2 makes

T1

Thread 1

requests to server

generates results

Requests

Receipt &queuing

Worker poolWorker pool

A disadvantage of this architecture is its inflexibility

Another disadvantage is the high level of switching between the I/O and worker threads as they manipulate the share queue



Figure 6.6Alternative server threading architectures (see also Figure 6.5)

a. Thread-per-request b. Thread-per-connection c. Thread-per-object

remote

workers

I/O remoteremote I/O

per-connection threads per-object threads

objects objects objects

Advantage: the threads do not contend for a shared queue, and throughput is potentially maximized

Disadvantage: the overheads of the thread creation and destruction operations

request

Associates a thread with each connection

Associates a thread with each object

In each of these last two architectures the server benefits from lowered thread-management overheads compared with the thread-per-request architecture.

Their disadvantage is that clients may be delayed while a worker thread has several outstanding requests but another thread has no work to perform



Figure 6.7State associated with execution environments and threads

Execution environment ThreadAddress space tables Saved processor registersCommunication interfaces, open files Priority and execution state (such as

BLOCKED)Semaphores, other synchronizationobjects

Software interrupt handling information

List of thread identifiers Execution environment identifier

Pages of address space resident in memory; hardware cache entries



A comparison of processes and threads as follows

Creating a new thread with an existing process is cheaper than creating a process.

More importantly, switching to a different thread within the same process is cheaper than switching between threads belonging to different process.

Threads within a process may share data and other resources conveniently and efficiently compared with separate processes.

But, by the same token, threads within a process are not protected from one another.



A comparison of processes and threads as follows (2)

The overheads associated with creating a process are in general considerably greater than those of creating a new thread. A new execution environment must first be created,

including address space table

The second performance advantage of threads concerns switchingswitching between threads – that is, running one thread instead of another at a given process



A context switch is the transition between contexts that takes place when switching between threads, or when a single thread makes a system call or takes another type of exception

It involves the following: The saving of the processor’s original register state, and

loading of the new state In some cases; a transfer to a new protection domain –

this is known as a domain transition



Thread scheduling

In preemptive scheduling, a thread may be suspended at any point to make way for another thread

In non-preemptive scheduling, a thread runs until it makes a call to the threading system (for example, a system call).

The advantage of non-preemptive scheduling is that any section of code that does not contain a call to the threading system is automatically a critical section Race conditions are thus conveniently avoided

Non-preemptively scheduled threads cannot takes advantage of multiprocessor , since they run exclusively



Thread implementation

When no kernel support for multi-thread process is provided, a user-level threads implementation suffers from the following problems The threads with a process cannot take advantage of a

multiprocessor A thread that takes a page fault blocks the entire process

and all threads within it Threads within different processes cannot be scheduled

according to a single scheme of relative prioritization



Thread implementation (2)

User-level threads implementations have significant advantages over kernel-level implementations Certain thread operations are significantly less costly

For example, switching between threads belonging to the same process does not necessarily involve a system call – that is, a relatively expensive trap to the kernel

Given that the thread-scheduling module is implemented outside the kernel, it can be customized or changed to suit particular application requirements. Variations in scheduling requirements occur largely because of application-specific considerations such as the real-time nature of a multimedia processing

Many more user-level threads can be supported than could reasonably be provided by default by a kernel



The four type of event that kernel notified to the user-level scheduler

Virtual processor allocated The kernel has assigned a new virtual processor to the process, and this is the

first timeslice upon it; the scheduler can load the SA with the context of a READY thread, which can thus can thus recommence execution

SA blocked An SA has blocked in the kernel, and kernel is using a fresh SA to notify the

scheduler: the scheduler sets the state of the corresponding thread to BLOCKED and can allocate a READY thread to the notifying SA

SA unblocked An SA that was blocked in the kernel has become unblocked and is ready to

execute at user level again; the scheduler can now return the corresponding thread to READY list. In order to create the notifying SA, the another SA in the same process. In the latter case, it also communicates the preemption event to the scheduler, which can re-evaluate its allocation of threads to SAs.

SA preempted The kernel has taken away the specified SA from the process (although it may

do this to allocate a processor to a fresh SA in the same process); the scheduler places the preempted thread in the READY list and re-evaluates the thread allocation.



Figure 6.10Scheduler activations

ProcessA

ProcessB

Virtual processors Kernel

Process

Kernel

P idle

P needed

P added

SA blocked

SA unblocked

SA preempted

A. Assignment of virtual processors to processes

B. Events between user-level scheduler & kernel Key: P = processor; SA = scheduler activation

Scheduler activation (SA) is a call from kernel to a process



6.5.1 Invocation performance

Invocation performance is a critical factor in distributed system design

Network technologies continue to improve, but invocation times have not decreased in proportion with increases in network bandwidth

This section will explain how software overheads often predominate over network overheads in invocation times



Figure 6.11Invocations between address spaces

Control transfer viatrap instruction

User Kernel

Thread

User 1 User 2

Control transfer viaprivileged instructions

Thread 1 Thread 2

Protection domainboundary

(a) System call

(b) RPC/RMI (within one computer)

Kernel

(c) RPC/RMI (between computers)

User 1 User 2

Thread 1 Network Thread 2

Kernel 2Kernel 1



Figure 6.12RPC delay against parameter size

1000 2000

RPC delay

Requested datasize (bytes)

Packetsize

0

Client delay against requested data size. The delay is roughly proportional to the size until the size reaches a threshold at about network packet size



The following are the main components accounting for remote invocation delay, besides network transmission times

Marshalling Data copying Packet initialization Thread scheduling and context switching Waiting for acknowledgements

Marshalling and unmarshalling, which involve copying and converting data, become a significant overhead as the amount of data grows

Potentially, even after marshalling, message data is copied several times in the course of an RPC

1. Across the user-kernel boundary, between the client or server address space and kernel buffers

2. Across each protocol layer (for example, RPC/UDP/IP/Ethernet)

3. Between the network interface and kernel buffers

This involves initializing protocol headers and trailers, including checksums. The cost is therefore proportional, in part, to the amount of data sent

1. Several system calls (that is, context switches) are made during an RPC, as stubs invokes the kernel’s communication operations

2. One or more server threads is scheduled

3. If the operating system employs a separate network manager process, then each Send involves a context switch to one of its threads

The choice of RPC protocol may influence delay, particularly when large amounts of data are sent



A lightweight remote procedure call

The LRPC design is based on optimizations concerning data copying and thread scheduling.

Client and server are able to pass arguments and values directly via an A stack. The same stack is used by the client and server stubs

In LRPC, arguments are copied once: when they are marshalled onto the A stack. In an equivalent RPC, they are copied four times



Figure 6.13A lightweight remote procedure call

1. Copy args

2. Trap to Kernel

4. Execute procedureand copy results

Client

User stub

Server

Kernel

stub

3. Upcall 5. Return (trap)

A A stack



6.5.2 Asynchronous operation

A common technique to defeat high latencies is asynchronous operation, which arises in two programming models: concurrent invocations asynchronous invocations

An asynchronous invocation is one that is performed asynchronously with respect to the caller. That is, it is made with a non-blocking call, which returns as soon as the invocation request message has been created and is ready for dispatch



Figure 6.14Times for serialized and concurrent invocations

Client Server

execute request

Send

Receiveunmarshal

marshal

Receiveunmarshal

process results

marshalSend

process args

marshalSend

process args

transmission

Receiveunmarshal

process results

execute request

Send

Receiveunmarshal

marshal

marshalSend

process args

marshalSend

process args

execute request

Send

Receiveunmarshal

marshal

execute request

Send

Receiveunmarshal

marshalReceive

unmarshalprocess results

Receiveunmarshal

process resultstime

Client Server

Serialised invocations Concurrent invocations

pipelining



6.6 Operating system architecture

Run only that system software at each computer that is necessary for it to carry out its particular role in the system architectures

Allow the software implementing any particular service to be changed independently of other facilities

Allow for alternatives of the same service to be provided, when this is required to suit different users or applications

Introduce new services without harming the integrity of existing ones



Figure 6.15Monolithic kernel and microkernel

Monolithic Kernel Microkernel

Server: Dynamically loaded server program:Kernel code and data:

.......

.......

Key:

S4

S1 .......

S1 S2 S3

S2 S3 S4

Where these designs differ primarily is in the decision as to what functionality belongs in the kernel and what is to be left to sever processes that can be dynamically loaded to run on top of it

Microkernel provides only the most basic abstraction.

Principally address spaces, the threads and local interprocess

communication



Figure 6.16The role of the microkernel

Middleware

Languagesupport

subsystem

Languagesupport

subsystem

OS emulationsubsystem ....

Microkernel

Hardware

The microkernel supports middleware via subsystems



Comparison

The chief advantages of a microkernel-based operating system are its extensibility

A relatively small kernel is more likely to be free of bugs than one that is large and more complex

The advantage of a monolithic design is the relative efficiency with which operations can be invoked

slides for chapter 6: operating system support from coulouris, dollimore and kindberg distributed...

Documents

kindberg distributed

network operating systems

systems ythey

addisonwesley publishers

design edn

instructors guide

system kernel zthe chapter

remote resources