vipul patel [email protected]. ideas … please …

Vipul [email protected]

Ideas… Please…

Some Funda….Computer Architecture for Interconnected

Multi Processor systems

CPU CPU CPU CPUSharedMemory

Interconnecting H/W

Local Memory

Processor

Local Memory

Processor

Local Memory

Processor

Local Memory

Processor

Interconnecting N/w

Tightly Coupled

Loosely Coupled

Coming to The Point ….Tightly Coupled System = || Processing Sys..

Limitations : Limited Area/BandwidthLoosely Coupled System = Distributed Sys.

Processors can be geographically farFully Scalable Ideally No Limits

Distributed System isCollection of processors interconnected by communication

n/wEach processor have its own local memory and peripheralsCommunication between processors is by message passing

over communication n/wFor a processor, own resources are ‘local’ where as other

processor and their resources are ‘remote’.A processor and its resource are referred as

node/site/machine

Some History….In past computers were large & CostlyPunch Card Days….!!Job Setup was a problem rather than CPUBatch Process was introduced…! Batching similar job increased CPU throughput,

Offline processing increased I/O performance i.e. multiprocessing was introduced

Time sharing was introduced to allow multiple users, this leaded to concept of dumb terminals for multiple interaction

Advances in h/w technology made this things common after early 70s and induced minicomputers

More on History….This time sharing provided two basic ideas of DS,

Sharing of resources by multiple usersUsing of computers from different locations

Later dumb terminals were replaced by ‘intelligent’ terminals

That leaded to ‘workstations’The idea was again sharing of resources

Limitation of distance was eliminated by advances in LAN & WAN

And finally the baby was born ( I mean the concept of DS… ) in late 70s.

An still that ‘baby’ is growing.. ..!

Distributed Computing ModelsMinicomputer ModelInterconnected Mini

ComputersSimple Extension of

Centralized time sharing model

Several interactive terminals are connected to each minicomputer

User logged on one computer can access remote resources on other computers

Useful for resource sharing.

Comm.N/w

Mini

Comp.Terminals

Mini

Comp.Terminals

Mini

Comp.

Terminals

Mini

Comp.

Terminals

Distributed Computing ModelsWorkStation ModelInterconnected Work StationsEach workstation may have

own disk, resources and serving as single PC

Some may be idle, some may be busy..

Idea is to utilize idle ones for the busy ones

If a user logs on a machine..and system finds that it doesn’t have sufficient capacity than load should be X’fered to others and result should be returned to user….( So Simple…;-) )

Comm.N/w

WS

WS

WS

WS

Distributed Computing Models (WorkStation)But it is not so simple.. B’caz

How to decide Ideal Workstation ?? How the process will be X’ferred ?? What happens if the idle workstation becomes busy??

Well First two.. You will learn after few weeks lets talk about 3rdAllow remote process to share resources of workstation

along with own process Easy to implement but then own user may not get optimum

performanceKill the remote process

Then what will happened to half cooked things, it is a waste , and will lead to inconsistency .. So

Migrate remote process back to home, so execution can be continued there. Complex implementation b’caz preemptive process migration

comes in to picture

Distributed Computing ModelsWorkstation-Server ModelA network of Minicomputers and Workstation( In

which some may diskless)Who will provide the file system to diskless

workstations (A diskful workstation or a minicomputer)

Other minicomputers may provide other services like database, print etc.

Specialized machines for specialized services..User logs on to workstation, for local needs,

workstation is sufficient, but for specialized services, request should go to to servers and servers should give response.

The benefit is “process migration” is not needed so things are simple..

Distributed Computing ModelsAdv. Workstation-Server Model Vs. Workstation

model Cheaper (Eg. few m/c with large HDD vs. ALL)Maintenance/Backup is easyUser have flexibility to go to any workstation to

access service.It uses request-response protocol so there is no

need for process migration.Client-server model is well acceptable, and

programmatically easy to implement.User is having guaranteed response time because

of process distribution.This architecture doesn’t care about idle resources.

Distributed Computing ModelsProcessor Pool Model Funda is ..user doesn’t need full

processing power all the time, but may require it for a short duration ( eg. Compilation)

So in this model processors are pooled to gather to be shared by user as needed

This pool consist of large no of micro and mini computers attached to n/w

Each pool have its own memory to load and run a program.

No pool is having terminals directly attached to them, user access is via special terminals (eg. X), A special server (run server) handles demand and supply of pool

Communication N/w

RunServer

TerminalsTerminals

FileServer

Processor Pool

Distributed Computing ModelsSomething More on Processor Pool Model Here there is no concept of home machine, as

user logs on to a system.Better utilization compare to workstation-server

model, b’caz entire power is available to logged user

More flexible , b’caz service can be easily expanded without adding machines….How.. Ideas please

Not Suitable for high performance interactive application or graphics operation, because of high n/w communication.

Hybrid Model…Workstation Server Weds Processor Pool

Why DCS are becoming popular..Inherently distributed applicationsInformation Sharing among usersResource SharingBetter Price performance ratioShorter response time/higher

throughputHigher ReliabilityExtensibility and Incremental GrowthBetter FlexibilityPlease Read Section 1.4 from Book, for

detailed story…

What is Distributed Operating SystemOS = A program that controls the resources of

computer system and provides its users with an interface or virtual machine.

Two basic tasks of an OS are To present users with a virtual machine( interface) that is

easier to program than the underlying hardware Manage various resources of the system

OS being used for distributed environment can be broadly classified in following types Network Operating Systems Distributed Operating Systems

Features which differentiate above are system image, autonomy and fault tolerance

System Image..That is .. How user ‘see’ the systemIn case of NOS it is collection of distinct machines

connected by communication subsystem, so users are aware that they are using multiple system

DOS hides the existence of multiple computers and provides a single system image, collection of machine acts as a virtual uni-processor

With NOS a user needs to ‘know’ the location of the resources to execute job,and for that different ‘system calls’ are required for local and remote resources

DOS user doesn’t require above and OS automatically takes care of resources and system call

The idea is “Transparency”…Yatrik’s Rule “ Better Transparency =Better

Performance, More Ease”

AutonomyEach system of NOS has its own local OS ( May not

be same), and there is no co-ordination at all.Only exception of above rule that when two

process of different computers communicate, they must agree on communication protocol.

Each system is independent of others for local resources.

System calls for different calls may be differentIn DOS , A single systemwide operating system and

each computer runs as part of that OSAll computers are tightly interwoven, having close

cooperation with each other for optimized resource utilization

More on Autonomy…There is a globally valid set of system calls

supported by OS is available on all computers .

This set of system calls is implemented a set of program called ‘kernel’

‘kernel’ manages and controls hardware in a way that all resources are available to other programs through system call.

Identical ‘kernel’s are running on all the computers, they often co-operate with each other when such need(?) arises.

Computers in NOS are having higher degree of autonomy compared to DOS

Fault Tolerance capabilityNOS don’t have fault tolerance capabilityDOS are having higher ( or very high) fault

tolerance capability.

A distributed computing system that uses NOS is referred to as ‘network system’ where as one that uses a distributed operating system as ‘true distributed system’ (or distributed system)

Design Issues of DOSSome Background… In design of NOS ( Centralized OS), it is assumed that OS has

complete and accurate information about the environment in which it is functioning.

In above case OS is also aware that, result of state check will be always true

Where as a DOS must be design by keeping in mind that no information is available to you about environment

Resources are physically separated, No common clock between multiple processor , delivery of intercommunication message get delayed or may be lost

Due all these DOS doesn’t have consistent/latest knowledge about state of various components and this makes things complex

Even though it is complex ultimate objective is to provide a virutal uni-processor system to user, and for the same there are some key design issues..

Transparency

Access TransparencyIt means a user should not need or able to

recognize either a resource is remote or local user should access remote resource same way as local The system call should not distinguish between remote and local. it is responsibility of OS to locate resource and get job done.

In short a well design set of system calls is required (‘kernel’)

It is NOT possible to have system calls which have full access transparency

Transparency……Location TransparencyName Transparency

Name of resource should be independent of the physical connectivity or topology of the system or current location

Movable resources must be allowed to move without having their name changed

Resource name must be unique system wideUser Mobility

A user should be allowed to access resources with same name from any where ( or any system)

In short … “Each resource on a system should be identified by a global name, and to have that a global resource naming facility should be available”

Transparency……Replication TransparencyAlmost all DOS have provision to create replicas (of

files/resources)Existence of this replicated resources and the

replication process should be transparent to usersIt is responsibility of the system to name the

various copies of resources and map it to user defined names

System has to also do replication control eg. How many copies, where to place, when to create/delete replica etc.

Transparency……Failure TransparencyDeals with masking from user’s partial failures like

communication link failure, storage device failure, machine failure etc.

System should continue to function in case of above ( performance may decrease)

Complete failure transparency is not achievable at present100% failure transparent system may lead to high cost and

low performance because of high degree of redundancyEx. File services.PKS says “ Theoretically possible but practically not

justified” (??????????)

Transparency……Migration TransparencyFor better performance, reliability and security, a

movable object often moves from one node to anotherMigration transparency deals with automatic

handling of movable objectsImportant issues to achieve MT as follows

Migration decisions (What to move where..) should be automatically taken by the system

Migration of an object should not require any change in name of object

At time of migration of process, inter process communication(IPC) mechanism should ensure that a message sent to migrating process should always reach to it.

Transparency……Concurrency Transparency It is always economical to share system resources among

concurrently executing user processesNumber of resources is always restricted one process will

surly influence action of other concurrently executing process ( b’caz of competitions)

Concurrency Transparency = above should not “feeled” (or felt )

User should always feel that he/she is the sole user of the resources

Issues An ‘event-ordering’ property should ensure “proper order” of all

request to resource to provide consistent view to all users A ‘mutual exclusion’ property to ensure that “ At given point of time

only one process should access the resource” A ‘no starvation property’ to ensure “if every process granted a

resource, must not be simultaneously used by multiple processes,eventually releases it, every request is granted”

A “no deadlock property” …..You know what is that…

Transparency……Performance TransparencySystem should automatically reconfigure it self to

achieve optimum performance.Processing Capability of the system should be

uniformly distributed among the currently available jobs in the system

Scaling TransparencySystem should scale (expand) without disrupting

activities of usersThis needs open system architecture and use of

scalable algorithms

ReliabilityDistributed systems are expected to be more reliable than

centralized systems, due to multiple existence of resourcesOnly “multiple existence” can not do magic , One need to

design system to maximize use of “multiple existence” You know what is ‘fault’ (which cause system failure)Two types of failures “fail-stop” and “Byzantine” ( This

spelling itself suggest something fishy )Fault-handling mechanism should ….

Avoid faults Tolerate faults Detect and recover from faults..

Fault AvoidanceOccurrence of faults should be minimized Fault avoidance of H/w components is almost IMPOSSIBLES/w components must be tested thoroughly to avoid faults

Reliability……Fault ToleranceFault tolerance is ability of a system to continue functioning

in the case of partial system failureFew concepts to achieve fault tolerance capabilityRedundancy Techniques

Basic idea is to avoid failure by replicating h/w and s/w components (or maintaining multiple copies)

If one fails then other one can be used (At least theoretically) This will create additional overhead to maintain multiple copies

and consistency between them More copies better reliability larger overhead ? Is how to balance, how much replication one wants..? To have ‘n’ fault tolerant system ‘n+1’ replica(s) needed, and to

have ‘n’ Byzantine fault tolerant system ‘2n+1’ replica(s) needed ( Why?)

For s/w components another idea is to use a virtual storage device that can withstand transient I/o faults and storage media.

Distributed Control Control mechanism to avoid single point of failure E.g. To have multiple independent file servers controlling

multiple and independent storage devices (or name servers/print servers )

Reliability……Fault detection and recoveryCommonly used techniques are atomic transactions, stateless

servers , acknowledgement and timeout based mechanismsAtomic transactions

For computation consisting of a collection of operations either all operations are performed or none of their effect prevails

Other concurrent process can not modify/see intermediate states This helps to preserve consistency of data objects Crash recovery is more easier ( why…?) B’caz transaction can have only two states: either all are

performed or none is performedStateless servers

In client server mechanism server can have two paradigms .. ‘stateful’ or ‘stateless’

‘Stateful’ server maintains history of transactions with client which is not done by ‘stateless’

In case of failure stateless are better because they don’t maintain transaction record with client , where as stateful requires complex mechanisms to recover

Reliability……Acknowledgement and timeout based transmission

of messagesNode crash or communication link failure may interrupt

communication between two process resulting in loss of message

IPC mechanisms must have ‘something’ to detect loss of message

This “something” = time So if acknowledgement doesn’t come ‘in-time’ then

message should be re-transmitted.This retransmission may also cause duplicate messages….

Which should be avoided

Main drawback of reliable system is POTENTIAL LOSS OF EXECUTION TIME EFFICIENCY DUE TO X’TRA OVERHEAD INVOLVED IN IMPLIMENTING THIS TECHNIQUES.

FlexibilityWhat is need of so called ‘flexibility?Ease of modification

It should be easy to incorporate changes in user transparent manner/with minimum interrupt

Ease of enhancement It should be easy to add new functionalities/services to

the system If user wants his/her own service, or modify existing

service he should be allowedDesign of kernel greatly influence “flexibility” “Kernel” is central part of the OS which provides

basic system facilities, it operates in separate space that is not accessible by user ( so user can’t modify)

You know that in distributed system identical kernels run on each node

Flexibility…..In Distributed system mainly we refer to

two kind of kernel that are monolithic and microkernel

User ApplnMonolithic(includes most of

OS service)


OS service)


OS service)

Network H/w

User Appln

Microkernel

(Only Minimal

Facilities)

Srvr/MngrModule

User Appln

Microkernel

(Only Minimal

Facilities)

Srvr/MngrModule

User Appln

Microkernel

(Only Minimal

Facilities)

Srvr/MngrModule

Network H/w

Flexibility…..In monolithic kernel model, most of OS services

like process/memory/device/file management, IPC etc are provided by kernel

So in above case kernel has large monolithic structure

In microkernel model,main funda is to keep kernel ASAP ( not that one….! it is As Small As Possible)

So kernel provide only minimal facilities limited to IPC/low level device mgmnt/low level process mgmnt etc

All other OS services/call handling etc. are implemented by user level server process, each process has its own address space and can be programmed separately

Now tell me which is better.

Flexibility…..Basic advantages of microkernel model

being used in DOS are( Why we use C/C++/C#/VB instead of

assembly ?)Flexibility advantage of the microkernel

model Theoretically microkernel model gives

poor performance, that is not true in practice, overhead in message passing is usually negligible compared to other factors.

PerformancePerformance of application in DOS >=

performance on centralized systemDesign principals are as followsBatch if possibleCache if possibleMinimize copying of dataMinimize the network trafficTake advantage of ‘fine-grain’

parallelism for multiprocessing

ScalabilityScalability = capability of system to adapt to increased service loadDOS should be designed to cope with growth of nodes and for that

Design funda are …Avoid centralized entities b’caz..

Failure of entity often brings entire system down ( fault tolerance will be affected)

Performance of centralized entity becomes bottle neck Even though centralized entity got performance power ,capacity of n/w

may play a role ( contention !!) In case of WAN based systems its improper to serve all request by

single serverAvoid Centralized algorithms

Centralized algo = collect information from all nodes, process it at one node and distribute results to other nodes (eg. Scheduling Algorithm )

Decentralized algorithms should be used where global state information is not collected, decision is based on locally available information and global clock doesn’t exists

Scalability…..

Perform most operation of client workstationAs server is common resource for several clients, and

server cycles are more precious than cycles of client.This principal enhances the scalability of the system, as it

allows graceful degradation as system growsCaching is frequently used

HeterogeneityHeterogeneous DOS = interconnected sets of

dissimilar hardware and software systemsThere are lots of incompatibilities which includes

formatting schemes / communication protocols and topologies etc..

Some form of data translation is necessary for interaction between two incompatible nodes

This translation can be done at sender o receiver's end, and to have this each node must have translator for each format

If there are n formats then n-1 translators at each not total n(n-1) translators in entire system (!!!!)

This need can be reduced by using intermediate standard data format. So each node should only know how to read/write standard format ( so simple…!)

SecurityPrevention from unauthorized accessMore difficult in DOS … ( You know why …)I thought you really know…!!!!Compared to centralized system DOS should have

following additional requirements It should be possible for sender of a message to know that

message has been really received by actual receiver It should be possible for the receiver of a message to know that

the message was sent by genuine sender It should be possible for both the sender and the receiver of a

message to be guaranteed that the contents of a message were not changed while it was in transfer

Cryptography is the only known practical method When security depends on fewest possible entities , the

system is supposed to be more secure.

Emulation of Existing Operating SystemFor commercial issues it is important that newly

design DOS should able to emulate popular OS such as unix/linux

New s/w can be written using the system call interface of new OS to take full advantage of distributed computing, but vast no of already existing old s/w can also be run on same system without re-writing them.

So new DOS will allow both the types of s/w which runs side by side…

Thank You

vipul patel [email protected]. ideas … please …

Documents

resources of workstation

idle workstation

terminals mini

workstation model cheaper

diskful workstation

ideal workstation

remote process

nw ws slide