the maruti hard real-time operating systemhome.iitj.ac.in/~saurabh.heda/papers/research...

16
The MARUTI Hard Real-Time Operating System Shem-Tov Levi, Satish K. Tripathi, Scott D. Carson and Ashok K. Agrawala Department of Computer Science University of Maryland College Park, MD 20742 Abstract The MARUTI operating system is designed to support real-time applications on a variety of hardware systems. The kernel supports objects as primitive entities, and provides a communication mechanism that allows transparent distribution in networked systems. Fault tolerance is provided through replication and consistency-control mechanisms. Most importantly, MARUTI supports guaranteed-service scheduling, in which jobs that are accepted by the system are verified to satisfy general time constraints. Guaranteed-service scheduling means that, given a job with a set of service requirements and time constraints, the system automatically verifies the schedulability of each component of the job with re- spect to the job's constraints and those of other jobs in the system. These time constraints include those that govern interrupt processing, which allows the MARUTI approach to succeed where tess rigorous ap- proaches do not. The result is that MARUTI applications can be executed in a predictable, deterministic fashion. 1 Introduction Conventional real-time operating systems are based on hard-priority schedulers. Requirements for real-time applications, however are specified as time constraints. The difficulty of translating system requirements into priority schemes results in ad hoc techniques for assigning priorities. Further, priority inversions, in which lower priority tasks such as interrupts preempt high priority tasks, cause anomalous behavior in priority-driven systems. Proving that priority-driven applications meet their timing requirements is a cumbersome process. The basic technique is to enumerate all event sequences, eliminate those precluded by the priority scheme, and show that the resulting sequences meet the requirements. In complex systems, this process can become overwhelming. The fundamental problem with the priority-based approach is that the requirements are specified in terms of a service, namely, time, that the system does not provide. An alternate approach, described in this paper, is for the system to provide guaranteed execution based on task deadlines. In this approach, the operating system guarantees that once a task is accepted, its timing deadlines will be met. The MARUTI operating system provides such a service. 1.1 Objectives MARUTI is a hard real-time, fault-tolerant, distributed operating system. As such, it must provide a guarantee to each of its accepted jobs that the deadline associated with the job is met, it must support distribution of computations, and it must allocate resources in a manner that supports the fault tolerance goals of its accepted jobs. MARUTI is built as a modular system, to allow design, analysis, and verification of properties (temporal and other specified properties) of various services and algorithms. The architecture of MARUTI emphasizes the independence of various elements of the system. Thus it is possible to replace servers with others that 90

Upload: ngongoc

Post on 07-May-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

The MARUTI Hard Real-Time Operating System

S h e m - T o v Levi , Sa t i sh K. T r i p a t h i , S c o t t D. C a r s o n a n d A s h o k K. A g r a w a l a

D e p a r t m e n t of C o m p u t e r S c i e n c e

U n i v e r s i t y of M a r y l a n d

Co l l ege P a r k , M D 20742

Abstrac t

The MARUTI operating system is designed to support real-time applications on a variety of hardware systems. The kernel supports objects as primitive entities, and provides a communication mechanism that allows transparent distribution in networked systems. Fault tolerance is provided through replication and consistency-control mechanisms. Most importantly, MARUTI supports guaranteed-service scheduling, in which jobs that are accepted by the system are verified to satisfy general time constraints.

Guaranteed-service scheduling means that, given a job with a set of service requirements and time constraints, the system automatically verifies the schedulability of each component of the job with re- spect to the job's constraints and those of other jobs in the system. These time constraints include those that govern interrupt processing, which allows the MARUTI approach to succeed where tess rigorous ap- proaches do not. The result is that MARUTI applications can be executed in a predictable, deterministic fashion.

1 I n t r o d u c t i o n

Conventional real-time operating systems are based on hard-priority schedulers. Requirements for real-time applications, however are specified as time constraints. The difficulty of translating system requirements into priority schemes results in ad hoc techniques for assigning priorities. Further, priority inversions, in which lower priority tasks such as interrupts preempt high priority tasks, cause anomalous behavior in priority-driven systems.

Proving that priority-driven applications meet their t iming requirements is a cumbersome process. The basic technique is to enumerate all event sequences, eliminate those precluded by the priority scheme, and show that the resulting sequences meet the requirements. In complex systems, this process can become overwhelming.

The fundamental problem with the priority-based approach is tha t the requirements are specified in terms of a service, namely, time, that the system does not provide. An alternate approach, described in this paper, is for the system to provide guaranteed execution based on task deadlines. In this approach, the operating system guarantees that once a task is accepted, its timing deadlines will be met. The MARUTI operating system provides such a service.

1 .1 O b j e c t i v e s

MARUTI is a hard real-time, fault-tolerant, distributed operating system. As such, it must provide a guarantee to each of its accepted jobs that the deadline associated with the job is met, it must support distribution of computations, and it must allocate resources in a manner that supports the fault tolerance goals of its accepted jobs.

MARUTI is built as a modular system, to allow design, analysis, and verification of properties (temporal and other specified properties) of various services and algorithms. The architecture of MARUTI emphasizes the independence of various elements of the system. Thus it is possible to replace servers with others that

90

owner directory

user-object 1 : local calendar

user-object 2 ~ _ _ J - - 1 I . . . . I _ _

service requirements

server-object ? __r--i__

server-object __J~ l__

server-object

ticket protection mechanism

consistency control mechanism

executable object joint

ezeculable ---¢ object

body

Figure 1: Example: Single Access Server Object

provide the same services, in a "plug in" manner. Temporal determinism is achieved by encapsulation of the services within localities that support deterministic time bounds, along with the use of explicit time expressions in the decision making process for scheduling.

Many real-time applications need temporal determinism to support more than simple deadline guarantees and response time requirements. For example, control applications need to enforce lead and lag parameters in order to achieve proper stability requirements. Therefore, both start-time and finish-time constraints are to be imposed on periodic as well as on aperiodic computations.

In this paper we introduce the principles according to which MARUTI is designed, and we give an intro- ductory description of the system. We start by describing the principles which have guided the MARUTI design, and then we describe MARUTI's components. Finally, we give an operative scenario of job accep- tance, as implemented in our system.

1.2 Approach and P r i n c i p l e s

The MARUTI operating system is object-oriented, supporting the encapsulation of services. Related objects in MARUTI are semantically linked through joints, as described in Figure 1. Remote related objects are locally linked through local agents of the remote objects. Each local agent is responsible for the communica- tion between its locality and its corresponding remote service, as well as for representing the remote service in schedulability verification and reservation schemes ([1, 7, 8]). The object principle and the use of the joints allow each access to an object to be direct, and the binding philosophy of the operating system supports it. Access to an executing object is an invocation of a particular service of that object. The joint allows many users to access a particular service, and provides mutual exclusion along with consistency control.

The second principle of MARUTI is the use of a calendar, a data structure implemented in the joint of an object. These data structures represent non-convex time intervals, and allow verification of schedulability, reservation of guaranteed services, and synchronization. Projection mechanisms support projections of time constraints between different localities, each of which has access to a different time server. These projections maintain the required event ordering and assure the satisfaction of the timing constraints.

Another principle in MARUTI design is the use of semantic links to express relations between objects. The links are established for both users (invokers) and owner, and deletion of the object only occurs when

91

no links are present. Being semantic links, exceptions and validity tests are reduced to a minimum after a link is established. The links are established by the binding and loading processes that are discussed in detail later in this paper. In these processes, the protection mechanisms are activated and authorizations are established, to allow a direct access afterwards. Links to remote objects are established with the remote objects' local agents.

Jobs in MARUTI are invocations of executable objects. The requirement of a reactive system to accept new jobs while executing already-accepted guaranteed jobs justifies the next principle adopted in this design: supporting on-line and off-line execution disciplines. Objects executing the guaranteed jobs belong to the on-line discipline. Objects with non-deterministic execution time bounds, an well as non-real-time jobs, are executed in an off-line discipline. An off-line execution is carried out only in cases where no real-time execution can be carried out. Since such an execution can be unbounded with respect to its execution time, it can be preempted by the on-line scheduler, using the time service to awake the preemption.

Fault tolerance is an integral part of MARUTI. Each object's joint contains a consistency control mech- anism to manage alternatives (redundant objects with state information) or replicas (redundant stateless objects). The resource allocation algorithm supports a user-defined level of connectivity of the computat ion graph, where redundancy can be established temporally (execute again) or physically (parallel execution of alternatives). Space redundancy supports also node and link failures, using roll-forward recovery techniques. Critical services are provided with forum and quorum protocols (e.g. the clock synchronization algorithm). Under a physical redundancy discipline, the communication subsystems use a node-to-node acknowledgement scheme, to avoid long end-to-end delays.

The MARUTI kernel is a collection of server objects, that is "booted" at the system's turn-on, and stays core-resident throughout the active lifetime of the system. There are interactions among the various servers of the operating system's kernel. Each of the servers is subjected to a defined set of operations and operates on a defined set of other objects. The services provided by the kernel are: interrupt handler, t ime service, scheduler, loader, and communication service. All the other services are viewed at higher levels, commonly referred as the application or the services level. The services provided at the higher levels are: allocation, binding, login service, name service, directory service and file service.

Different jobs in MARUTI may share objects or resources (which are also considered as objects), and therefore necessitate management of some type of queues. A queue of guaranteed executable objects is a data structure manipulated by a scheduler: a calendar.

1 .3 W h a t ' s N e w ?

The MARUTI design concept directly addresses the needs of "next generation" real-time systems ([10]). MARUTI differs from other existing operating systems in some of its basic principles. Some major differences are listed below.

• MARUTI is a real-time operating system, based on an architecture that supports real-time system requirements with a very high degree of determinism and predictability.

• It is an object-oriented operating system, and as such it maintains the properties of transparency of distribution to users.

• The system is driven by a time constraint model that imposes restrictions both un beginnings and on ends of executions, unlike many other systems that deal solely with the ends (deadlines).

• Requests for invocations of services that are accepted and acknowledged are guaranteed to satisfy their time constraints.

MARUTI gives a coherent solution to tasks that are of conventional type and to tasks that are interrupt driven. Most of the other proposed solutions for real-time systems are not suitable for interrupt driven tasks and may contradict deadlines because of interrupts.

92

..f human user

login server

t ime server I I loader I l al locator

binder

on-line off-line

scheduler scheduler

I I

: ' Mendar ' I C F

__2 executing non-executing

objects objects

/ name server

+pro tec t ion

directory

server

\ I file server

directory

Figure 2: MARUTI : User View of Mnagement

93

• The MARUTI architecture supports a reactive system definition. External requests are expected throughout execution of already accepted ones. MARUTI provides a means for verifying their schedu- lability, and for binding the requests to proper servers and resources.

An overview of an execution invocation with MARUTI 's component as seen by a user is defined in Figure 2. Note that the block that encounters the human user and its login server can be replaced by any user object which requires services of other objects or resources.

2 S c h e d u l e F e a s i b i l i t y V e r i f i c a t i o n a n d A l l o c a t i o n

In this section, we introduce the concept of time-oriented schedule feasibility check for accepting jobs to the system; the concept is introduced in schedule feasibility conditions. We then introduce an allocation condition, which is based on the existence of feasible schedule, and thus maintains a guarantee to meet temporal requirements.

In the following definitions, we use notation which we adopted in [7]. Computat ions and their bounds are represented by time intervals. Bounds are represented by contiguous intervals (the term used is convex) while computations are allowed to have gaps. The bounds on an interval A are expressed as begin,~in(A) and endm~x(A), respectively. A convex interval A is assumed to be delimited by t 2 and t¢. The leftmost convex sub-interval of a non-convex interval B is denoted as <1 B, and respectively the rightmost one by t>B.

is the interval cover operator, the interval containment property is denoted by >>, the interval intersection operator by N, the disjoint relation by N, and the empty set by ¢. Finally, generalization of convex interval relations to non-convex ones is denoted by< For example, the disjoint relation between two non-convex time interval is written ~.

Def in i t ion 1 The laxity of a (computation) non-convex interval Pi that is constrained within a (window) convex interval TCi is defined by the pair (x TC', x~c ' ) , such that

P(~ TCI = tt~Pi _ tTC, x T o ' = - begin i. ( P ¢ ) < to ' -

where pL = ~Pi , P 2 = t>Pi. []

A sufficient condition for having a feasible non-preemptive schedule can be given in terms of laxity. Intuitively, one can capture the condition as a total disjoint relation of all the computat ion intervals while maintaining each interval within its bounds.

C o n d i t i o n 1 Let the incoming time constraint have an occurrence window TCm and a non-convex compu- tation requirement Pin. Let V be the verification interval, derived from the duration of a time constraint TC~, for which TC~ = TCin or TC~ ~ TCin, such that I °, the set of already accepted time constraints that intersect with the verification window V, satisfies

]~TC} i) e T ° : TC} i) >> TC~.

The incoming time constraint is non-preemptively schedulable if

TC(J) TC (i) ~xTCi~ Vi, j : TC} i) E:[° : 3x_ ' >_0, 3x+ ' > 0 , _ >_0, a n d g x Tc'" >_0 :

where is the i Tc?) e z o } . []

94

In a preemptive scheduling discipline, the duration of the intervals play an impor tant role. However, we have to take into account the non-convex nature of the computation intervals. The duration of a convex interval A is denoted by [[All.

D e f i n i t i o n 2 The set of maximal convex subintervals of convex time intervals A and B is defined as S({A}, such that

• A M B = ¢ ~ 8 = { A , B }

• A M B # ¢ ~ N = { A ® B } .

The set of maximal convex subintervals of a non-convex time interval D is the set of maximal convex subin- tervals of all its convex members {di}. []

A sufficient condition for having a feasible preemptive schedule can be given in terms of the set of maximal convex subintervals. The feasibility of a preemptive schedule is based on the possibility to have intersecting intervals that can be reconstructed by preemption without contradicting their bounds.

C o n d i t i o n 2 Let the incoming time constraint be a time constraint with an occurrence window TCin and a non-convex computation requirement Pin. Let V be the verification interval, derived from the duration of a time constraint TCx, for which TCx = TCi~ or TC~ >> TCin, such that for I °, the set of already accepted lime constraints that intersect with the verification window V, there exists no time constraint that contains TC~. Let Sv be

5[v : Z ° U {TCi.} - {TC~}.

The incoming time constraint is preemptively schedulable if

Vi:TCiEZv Vk:p[k)Epi

A IIs ll _< I lV l l - IIP (k)ll

Vs~ES( Z v ) Vk:P(k ) E p~:

where s, are the convex subintervals of S ( Iw ). []

The above conditions are implemented in MARUTI ' s off-line service access point of the scheduler (IN- SERT_TC), while reserving time-interval in the object (or resource) calendar.

Our model of allocation is based on the fact that computations in MARUTI are constructed from objects and resources. The objects that participate in a computat ion are related to each other via semantic links that are represented in their joints. The temporal properties of each relation are expressed as either convex or non- convex time intervals in a calendar within the relevant joint. We distinguish between objects and resources for differences in fault tolerance properties that are related to monotonicity of faults, and for properties that are related to damage containment in case of faults. Monotonic faults become permanent after their occurrence while transient faults may disappear within a short time. The distinction between transient and monotonic faults, as expressed in our object/resource model, allows designers to use two possible recovery mechanisms. We denote the most common one as temporal redundancy, in which a "retry" effort is executed upon a fault detection. This mechanism is perfectly suited for faults whose existence may he a transient phenomenon. The roll-back type of recovery belongs to these mechanisms too. Real-t ime constraints may have conflicts with temporal redundancy, because the time needed for it does not always exist. Furthermore, in case of a monotonic failure retrying is not possible. In such cases, only physical redundancy can increase the system resilience to failures. The roll-forward type of recovery and the N-version programming (e.g. [3, 9]) belong to these mechanisms.

95

Let each executable object instance p have a set of resource requirements and service requirements DSp, called its dependency set. Restricting p with a time constraint TCp, implies a projected t ime constraint to each member of its dependency set. Each projection is a result of the temporal relation between p's execution and its requirements. A service requirement can be executed by another executable object instance chosen from a set of alternatives. Hence, we can define the dependency set as follows.

D e f i n i t i o n 3 The dependency set of an object p with a time constraint TCp is

DSp,TC, = { {< RIP),TCRI,) >: 1 < i < k} , {S}P) : 1 < i < n} }.

where RI p) are p's resource requirements, and S} p) are p's service requirements, such that

S}p). = {< ~J-(P)(i)' TC (p)(i) >: l < j _ < M(p)(i)}.

The schedule feasibility conditions, Conditions 1 and 2, allow defining the conditions for an object to be allocatable.

C o n d i t i o n 3 An object p is alloeatable, if it is schedulable, its resource requirements are schedulable, and if the set of its service requirements is not an empty set then for each of its service requirements there is at least one allocatable service alternative. []

The above condition is implemented in the allocator ALLOCATE service access point.

3 M A R U T I Components

MARUTI ' s components are objects. Each object provides a set of services, each of which is invoked through a defined service access point (SAP). A collection of objects called the kernel of the system is core-resident throughout the operation of the system. Each object that requires the services of another object must first bind itself to that object via a semantic link. The interconnections of the kernel members are kept active all the time, to avoid the need to re-bind them.

3 .1 K e r n e l C o m p o n e n t s

1. Interrupt Handler. Each interrupt service is reserved as any other object, yet it is different in the sense that it requires an event occurrence (the interrupt) as an enabling condition for its execution.

There are three SAPs for MARUTI interrupt handler.

• Interrupt Definition. Adds a member to the Interrupt_~ap, a data structure implemented as an ordered and balanced tree, whose nodes are the tuples

< device_identifier, object_server > .

The insertion of the new member is by an insertion algorithm (e.g. [5, chapter 6.2.3]) tha t maintains the tree ordered and balanced.

• Interrupt Removal. The undoing of the above operation.

• Interrupt Service. according to the following.

Only a very simple action is taken in response to an interrupt. First, interrupts are masked while being served. Then, the state of the interrupt service object is changed to an active one, indicating that the interrupt has already occurred. After these simple, short and time-deterministic operations are carried out, control is set back at the interrupted point and interrupts are enabled. It is impor tant to note tha t resources (or objects) are reserved for the interrupt service before the interrupt occurs. The reservation is done through the interrupt definition SAP, however it differs from a regular reservation in that it sets the service in an idle state.

96

2. T ime Service. Provides the knowledge of time, the past, the present and the future, to executing objects. Synchronization or ordering of events in the system may be associated with the past, in the sense that its requirements are due to events that have already occurred. The major property required in this aspect is that any two clocks in the system must differ from each other in the lowest possible value. The second issue, providing the knowledge of the global (universal) time, may be associated with the present. Here the service is to answer questions of the kind "what is the time now?". In that aspect, the major property required from each clock in the system is to be correct with small drifts, or in other words close to the global (universal) time. A third issue is the way in which a time service deals with projections onto the future. This issue is of extreme importance in hard real-time systems, in which allocation decisions are taken in accordance with events that have not occurred yet, but are known to occur in a known time interval in the future. The MARUTI time service is implemented as a t ime provider. This concept requires the following SAPs (Service Access Points):

• A W A K E _ A T : a service access point that invokes the timer to awake a specified object at a specified time.

• A W A K E _ A F T E R : similar to AWAKE_AT, but in relative time units instead of absolute ones.

• GET_TIME: establishes a connection to a provider where the current t ime and the error bounds for that particular provider are given.

• S E T _ T I M E : allows changing of "political time" as in time zone interpretation.

• A G R E E _ O N _ F O R U M : serves the initialization of participating forum of a clock synchronization. A most important parameter in forum establishment is the degree of fault tolerance required. The accuracy of the service is also a property that is derived from properties of participants in the quorum. Hence, an upper level service whose goal is to set an accuracy level may use this SAP.

• GIVE_BOUNDS: for predicting inaccuracies in the future, the time server provides future time bounds.

3. Scheduler. The scheduler is invoked each time a job terminates one instance of a t ime constraint to give control on the resource to another instance (if there exists one). This next instance is picked up from a calendar according to the scheduling policy (e.g. least-slack, earliest deadline, etc.). The calendar is ordered according to the time constraints and the scheduling policy. The allocator invokes the schedulability check of a requested time constraint and upon positive verification reserves the resource for the requesting user. This reservation prevents conflicts between different users. There is a time-out for this reservation. A user must invoke the scheduler in order to allow the execution to start. If such an invocation fails to arrive on time, the reservation is canceled, and a removal of the time constraint from the calendar takes place. The invocation for execution comes from the loader, or by an automatic bind-allocate-load-execute sequence.

In MARUTI the scheduler is independent of the allocator. The scheduler uses a calendar to pass control to the next available job, if the state of the job allows it. The following SAPs (Service Access Points) are provided for the scheduler.

• I N S E R T _ T C : is used to insert a time constraint (TC) into the calendar with a non-operative state, using the PUSI-I_TC algorithms from [8]. For a periodic time-constraint, an automat ic PUSH_TC of its next instance is clone by the EXIT SAP.

• R E M O V E _ T C : removes a time constraint from the calendar. For example, the reservation with t imeout scheme in allocation can be written as:

A W A K E _ A T ( now + t ime_out , R E M O V E _ 2 ~ C ( t i m e _ c o n s t r a i n t ) ).

• EXIT: the execution of each time constraint is terminated by the invocation of the scheduler's EXIT service.

97

• C H A N G E _ T C _ S T A T E : allows changes to the state of a time constraint in a object 's calendar. Possible invocation are: the loader, interrupt handler, the object itself (an auto-sleep entry).

4. Loader . The loader's task is to load the job into memory and to convert the indirect addresses into direct addresses. The invocation of alternative objects (as per the fault tolerance scheme) is also imposed by the loader. Alternatives that have already been reserved by the allocators are either loaded (invoked) or removed. The SAPs required for the above service:

,, L O A D : invokes a sequence of CHANGE_TC_STATEs (from idle to active) of reserved time con- straints in calendars. In addition, each loaded object addressing is converted to direct addressing.

• U N L O A D : invokes a sequence of REMOVE_TCs of time constraints in calendars that reserved time for these time constraints.

5. C o m m u n i c a t i o n . Two basic communication mechanisms are provided: objects that reside at the same site communicate via a shared buffer, and remote objects via message passing. T~e communication subsystem provides the system with synchronization ability as well. Because of the stringent timing requirements, communication protocols must take into account the deadline guarantee. The LN (Local Network) low level layers are implemented in the kernel. Higher layers of the communication protocols are included within a g e n t objects at application layers.

The communication media are considered as resources of the system, and as such, a reservation calendar must be established for each instance. The calendar is placed at each end-point of the communication resource and a service is reserved according to the agent requirements.

3 . 2 A p p l i c a t i o n L e v e l C o m p o n e n t s

1. A l l o c a t o r . Preallocation of all the required services and resources is needed in order to support the guarantee given to meet the deadline. The allocation scheme is based on the interval union of all the possible requirements, allocating for the set of maximal convex subintervals that result from the union.

The allocator's main concern is the fault tolerance restriction imposed on the computation. The goal of the allocator is to create a computation graph with a required degree of connectivity. In order to satisfy this goal, the allocation may be divided into two phases:

(a) verification phase, and

(b) selection phase.

The verification phase is carried out for each alternative service. When a request for an allocation arrives, the allocator needs to extract the requirements of the computation from the required server's joint (see Figure 1). Then the allocator invokes a PUSH_TC on the server's calendar and the calendars of the resources required by it using the off-line scheduler. An allocation request is sent to all the services that the server object needs to invoke. If a schedule feasibility verification holds for the local resource requirements and all the required services, a positive allocate answer is returned on behalf of the server to the initiator of the allocation request. Once a reservation is justified, the allocator can invoke the binder to create a semantic link between the initiator and the server object.

The selection phase chooses participants in the computation from the objects that answered positively in the verification phase. One selection approach is discussed in section 5. Another approach is given below. Here the selection uses the computation graph to choose a subgraph with the required connectivity. In order to support the selection, each positive answer from an allocator of an object includes a computation graph as viewed by that object. This graph is constructed from the answers of the servers invoked by this object and the resources required by this object. The two approaches are examined in MARUTI.

The allocation must be an off-line service, because it has an unbounded execution time.

98

2. Binder. The binder is responsible to "connect" the justification links, as well as to verify that the semantic relation is properly established. The resulting addresses are still not absolute, and only the loader will change them to be such. The binder installs the addresses it extracts from the directory service. Remote binding is done through agents, one at each side of the communication media, and binds are established from an object to a local agent. The semantic binding requires that the binder will at tach the proper mechanisms for parameter exchanges to each side of the semantic binding. A special task of the binder is to bind alternatives and replicas, attaching the proper voting mechanisms when needed. After execution terminates, an UNBIND is invoked off-line to remove the justifications. An object without a justification is deleted.

Special data structures can be constructed by binding passive objects to construct hierarchical data bases. This binding is independent of the allocation process.

3. Login service. The login server is the user interface of MARUTI. It is invoked by a LOG_IN command, to boot the directory owned by the user identified within the LOG_IN parameters. It thereafter executes as an off-line object, whose major task is command interpreting. Each logged-in user has an activated login service object executing throughout the login session. This server includes the command interpreter, the port driver, and the links to the user owned objects and to public objects. It is removed by a LOG_OUT command.

The command interpreter scans a conversation buffer, and after identifying a command delimiter, invokes the service requested in the command (or an exception handler in case there is an illegal pattern).

4. Name service. The name service uses the directory services and the file service in order to carry out its own service. The name server bridges the different name domains, from human oriented names (strings) to machine oriented names (addresses). At all these domains, when an identifier is forwarded, the name service provides the proper translation, if such an object exists.

The names should be of limited size, otherwise the search space for the names is large and one cannot bound the time required in order to find the identifier. This limitation in the identifiers size implies name reutilization, due to the fact that there is not an arbi t rary large number of identifiers available.

5. Directory service. A directory is treated as an object. Each user's directory points the objects owned by that user, using the owner justification links. When an owner deletes its justification for an object 's existence, having no justification should propagate to objects for which the unjustified serves as a justification. The protection mechanism is associated with each of the links between objects. The access for the use of the specific object will be enabled using a ticket oriented mechanism.

The directory service access points are:

• DIRECTORY_SELECT.

• DIRECTORY_ENTRY_INSERT.

• DIRECTORY-ENTRY_REMOVE.

• DIRECTORY_LIST.

• OBJECT_CREATE.

OBJECT_DELETE.

OBJECT_LOCATE (i.e. FIND).

service. The service allows reading or writing elements of a passive object (in a file system one 6. File may consider a line or a character as such elements). The MARUTI file server provides the following SAPs:

• DISPLAY_CONTENT.

99

local relation

--,4 agent ~ 4 communication ~_~ agent ]__,.

remote relation

Figure 3: Local versus Remote Relation

• GET_STATUS.

• F ETC H_S UB -OB J EC T (i.e. READ).

• INSERT_SUB-OBJECT (i.e. W R I T E ) .

4 E x e c u t i o n and D i s t r i b u t i o n C o n s i d e r a t i o n s

4 . 1 S c h e d u l i n g Q u e u e s

Each resource in the system is controlled by a server object. The server's calendar represents the guarantees given to user objects so far. A processor is treated as a resource, hence its server (the jobs scheduler) maintains its calendar. In this calendar, entries for on-line guaranteed jobs are maintained. At the end of an instance of a t ime constraint, the scheduler picks the next on-line instance of a t ime constraint from the calendar. If there is no on-line job to execute in the calendar, an off-line job may be executed. The off-line jobs are preemptable by nature. They may be preempted by the use of one of the awake SAPs, according to the next on-line instance of a t ime constraint. One may consider the calendar as a queue of on-line jobs, while a different queue is required for off-line jobs. The addit ional queue is required for two reasons. First, an off line job cannot be reserved in the calendar. Second, we want to reduce search in the calendar to a minimum, h r t h e r m o r e , we may apply different scheduling schemes to the different queues. For reasons of fairness and efficiency we have the following three queues for a resource:

1. The on-line t ime constraints ' calendar.

2. The off-line for on-line jobs queue.

3. The off-line queue.

4 . 2 R e m o t e S e r v i c e s

Typically, the communicat ion between two sites is established due to service requests from one object to another in a remote site. Virtual circuits are employed in the communicat ion subsystem. We distinguish between two possibilities of communications: object migrat ion and service requests. The latter method is under real-time constraints. After servicing the request, a response is sent back to the request originator.

Each object in a remote site must have a local representative in each of the sites it is expected to provide service for. They are called agents. Each agent takes care of the reservation of a t ime interval in its object 's calendar. The invocation relations described in Figure 2 can therefore be replaced by agents, as described in Figure 3.

Since the real-time properties must be preserved, the media involved in the communication must also be reserved. This implies tha t there is a calendar for each of the media sections between two nodes that has to be allocated.

i00

The creation of agents for remote objects can be done in various ways. One way is automat ic generation of agents after an object requests service from a remote object. The path to the latter must be specified and a capability ticket for the access presented. The operating system then creates the agent and the communication process is started. This is done, obviously, after the capability has been checked. Another way is the creation of agents prior to the execution of the requesting object.

Protection is also a concern of the communication service, since the media through which the requests and responses travel are unprotected. We need a mechanism to encode information and decode it properly in the destination site. Therefore, encryption techniques are also necessary.

The communication service is a part of the kernel that deals with the translation of elements in heteroge- neous distributed systems or networks. After a syntactic link is establish by the binder, the semantic link has a major role in the communication. As an example we can see that if one system sends an integer, the other system must receive an integer, regardless how the representation of each machine is conceptualized. The semantics of the transferred elements must be wholesome and agree in both end points of the communication. Thus, the agent translates the elements to be t ransmit ted whenever needed, to agree with the representation at the destination site.

During execution time, when any service is invoked, its computat ion t ime and the communication transfer times are bounded. The computat ion time is known a priori. The transfer time is also bounded due to the timely allocated virtual circuits, as implemented in the allocation and scheduling schemes.

5 Job A c c e p t a n c e in M A R U T I

5 .1 M A R U T I A f t e r B o o t

The image seen at a given locality (a processing node in a distributed environment) after the boot process terminates is characterized by the following.

1. MARUTI kernel is loaded, and it will stay core resident "forever" from the memory-manager point of view. The kernel includes the servers, buffers for devices (assumed to be DMA accessed), resource scheduler(s) and calendars, and space for on-line and off-line queues. The interrupt handler and the known interrupt tables and service routines are loaded. A system directory, which is a list of pointers and tickets to a set of publicly accessible objects owned by the system is loaded as well. In order to support acceptance of incoming invocation requests, designated resources are reserved periodicly as an idle "server". A mechanism of priority exchange is examined to assure a starvation free response, as well as sufficient on-line/off-line queue enablements.

In addition to the above idle server, two object types are loaded for execution: login-servers are attached to control stations ( T T Y / C O N S O L E ) and the periodic services of the local t ime server are executing.

2. The periodic service access points (SAPs) of the time server are in their resource calendars after an INSERT_TC has been carried out. Since these are interrupt driven services, each of them is initially at an idle state, which will be changed by the interrupt handler to an active state. The synchronization service starts executing after a FORUM and QUORUM protocol was carried out to select remote servers that participate in the distributed synchronization algorithm.

3. Each Login-server is queued in the off-line queue. It only allows the LOG_IN SAP to be invoked. Since the login-server at this s tate has a device orientation, we denote this mode as a device mode.

5 . 2 M A R U T I A f t e r L O G _ I N

After a LOG_IN has successfully completed, the login-server changes its mode to an owner-mode. It does it after loading the proper owner directory. Such a directory is a list of pointers and tickets to the owner's objects. In addition, the directory may point to objects owned by others for which this owner has acquired

i 0 i

accessing tickets sometime in the past. At this stage the logimserver acts as. a command line interpreter (CLI), allowing activation and manipulation of objects with the proper authorization checks.

Let us now classify the executable objects into groups and discuss their execution. A server object is characterized by being subjected to operation of other objects, and not acting on any other executable object. An actor object is characterized by not being subjected to any operation of other objects and acting on other executable objects. An agent object is characterized by being subjected to operations of other objects and acting on other executable objects. We start the execution discussion with the server object.

5 . 3 E x e c u t i n g a S e r v e r O b j e c t

In order to execute a server object, the following steps are taken.

• Resources and kernel servers for this objects must be allocated.

• Semantic links to required kernel servers must be established.

• The loader is either invoked, or set by INSERT_TC to be invoked in the future, to load the server object 1.

The allocator is viewed as an application level service, and therefore it must first be loaded itself.

1. Loader image is queued to the on-line/off-l ine queue with parameters tha t describe whom to load (allocator) and its parameter block (whom to allocate)

o f f - on - load (allocate (xyz .SAP~bc , t i m e cons t ra in t ) ).

This results in loading the allocator to the onqine/off-line queue, with parameters tha t relate to object xyz , in its SAP~b~ access point, and a t ime constraint which describes the allowed occurrence window. The loader image (in a design that allows multi-loader multi-allocator use) uses a designated memory space in which these allocators reside. This space is bounded, and hence in the above case an expensive context switching might be required.

2. The binder need not be invoked, because we assume tha t the semantic links of the allocator to the kernel servers are pre-established, and are activated by the loader. If that is not the case, then the allocator must invoke a binding and exit, through a binder SAP that exits itself via invoking the allocator to continue.

3. The allocator then invokes the schedulability verification tests in the resources off-line scheduling SAPs. If reservations are not confirmed, then a negative response is sent to the init iator (in this simple case a message is sent to proper display driver, in the next section the more complicated case is described: invoking a REPLY SAP of the initiating allocator with a negative parameter) . In addition, if there is a subset of resources for which reservation has been confirmed, UNLOAD invocations are activated to the members of this subset.

4. For the case of positive reservations by the schedulability verifications, there are two possibilities. The simple case is when the time constraint begin-time of its window-of-occurrence is soon enough (decided according to memory management policy), for which the allocated object is loaded now. Otherwise, a future load execution is reserved with the INSERT_TC of the memory manager (which might execute on the same CPU, depending on the architecture). The reservation might have reduced the laxity of the original t ime constraint, and the positive REPLY includes the final t ime constraint, and the resource subtree which was actually allocated.

1 If the implementation includes memory manager object, the loader is to be inserted in its proper page calendar, and a CHANGE_TC_STATE of the objects joint is to be inserted in the object resources.

102

The above proposition ends with a negative or a positive REPLY. In the positive case, either reservation are made both for the server and for the loader, or the server is loaded in active state for execution in memory.

The following section describes how the above principle is extended to deal with objects that require the services of other objects.

5 . 4 E x e c u t i n g a n A c t o r ( o r A g e n t ) O b j e c t

In order to execute an actor or an agent object, the following steps are taken.

• Resources and kernel servers for this objects must be allocated.

• Resources and kernel servers for the required servers and agents of this object must be allocated.

• Semantic links between all participants must be established.

• The loader is either invoked, or set by INSERT_TC to be invoked in the future, to load all the participating objects.

It can be summarized as an extension of the previous proposed scenario, to include the following differences.

1. The loader image is queued to the on-line/off-line queue exactly as above.

2. The binder is invoked to establish the semantic links to the requirement that are to be used. The allocator must invoke a binding and exit, through a binder SAP that exits itself via invoking the allocator to continue.

3. The allocator then invokes the schedulability verification tests in its required resources off-line schedul- ing SAPs. If reservations are not confirmed, then a negative response is sent to the requiring allocator ( that invoked this one), invoking the requiring allocator 's REPLY SAP with a negative parameter .

4. For the case of positive reservations by the schedulability verifications, this allocator then invokes allocating its server/agent object requirements, and exits.

5. Each of the invoked allocators then performs accordingly, exiting after a REPLY of this allocator is invoked by it.

6. After all the required REPLY invocations have arrived to this allocator from the allocators it has invoked, it invokes the REPLY of its requiring allocator with either a positive or a negative answer. The positive answer includes the updated time constraint (as a result of the laxity reduction) and the subgraph of computat ion as seen by this allocator. The subgraph is composed of its own resources and its required objects' answers.

6 C o n c l u s i o n

The MARUTI operating system is a hard real-time operating system that directly supports the requirements of distributed and fault tolerant systems. This operating system provides a guarantee to each of its accepted real-time jobs to meet the deadline associated with the job. MARUTI allocates its resources in a manner that supports the fault tolerance goals of its accepted jobs. Semantic links are used to establish the proper relations between objects, and calendars introduce the time notion explicitly into the allocation scheme.

Temporal determinism in MARUTI is achieved with encapsulation of the services within objects, along with the use of explicit t ime expressions both for s tar t - t ime and for finish-time constraints on periodic as well as on aperiodic computations.

The first phase of MARUTI ' s implementation is carried out on top of a modified UNIX operating system. The automatic services (e.g. scheduling and interrupt) are replaced by MARUTI ' s services, while some of

103

the UNIX system calls are used as application level services. This phase is implemented on a local network of SUN workstations connected via a LAN.

At the beginning of the first phase of implementation, the interrupt handler and the scheduler have been replaced. The time service employed is a slightly modified version of [6]. The next step of implementation integrates the allocator and binder using the UNIX name and directory services. Throughout the first phase of implementation of MARUTI, the file service of UNIX is employed to load the kernel and its objects.

7 R e l a t e d P u b l i c a t i o n s

1. "Real-Time Programs: Design Implementation and Validation-A Survey," A.K. Agrawala, S,-T. Levi, TR 1837, Department of Computer Science, University of Maryland, April, 1987.

2. "On Real-Time Operating Systems" A.K. Agrawala, S.-T. Levi, TR 1838, Department of Computer Science, University of Maryland, April, 1987.

3. "On Real-Time Systems Using Local Area Networks," S. K. Tripathi, S.-T. Levi, TR 1892, Department of Computer Science, University of Maryland, July, 1987.

4. "Objects Architecture: A Comprehensive Design Approach for Real-Time, Distributed, Fault-Tolerant, Reactive Operating Systems," A.K. Agrawala, S.-T. Levi, TR 1915, Department of Computer Science, University of Maryland, September, 1987.

5. "An Analysis of a Buddy System For Fault Tolerance," S.K. Tripathi, Finkel, TR 1924, Department of Computer Science, University of Maryland, September, 1987.

6. "On Fault Tolerance in Manufacturing Systems" S.K. Tripathi, P. Jalote, P. Chintamaneni, Shieh, TR 1939, Department of Computer Science, University of Maryland, October, 1987.

7. "Temporal Relations and Structures in Real-Time Operating Systems," A.K. Agrawla, S.-T. Levi, TR-1954, Department of Computer Science, University of Maryland, December, 1987.

8. "Scheduling in Real-Time Distributed Systems-A Review," A.K. Agrawala, S.K. Tripathi, Yuan, TR 1955, Department of Computer Science, University of Maryland, December, 1987.

9. "Scheduling Tasks in a Real-Time System," A.K. Agrawala, S.K. Tripathi, P.R. Chintamaneni, TR 1991, Department of Computer Science, University of Maryland, February, 1988.

10. "An Object Architecture for Hard Real-Time Operating System," J. Nehmer, TR 2003, Department of Computer Science, University of Maryland, March, 1988.

11. "Introducing the MARUTI Hard Real-Time Operating System," A.K. Agrawala, S.K. Tripathi, S.-T. Levi, TR 2010, Department of Computer Science, University of Maryland, April, 1988.

12. "Allocation of Real-Time Computations under Fault Tolerance Constraints," A.K. Agrawala, S.-T. Levi, D. Mosse, TR 2018, Department of Computer Science, University of Maryland, May, 1988.

13. "A Structuring Framework for Distributed Operating Systems," J. Nehmer, TR 2079, Department of Computer Science, University of Maryland, July, 1988.

R e f e r e n c e s

[1] Agrawala A. K. and Levi S.-T., "Objects Architecture for ReM-Time, Distributed, Fault Tolerant Op- erating Systems," IEEE Workshop on Real-Time Operating Systems, Cambridge, MA, July 1987.

104

[2] Booch G., "Object-Oriented Development," IEEE Transactions on Software Engineering, Vol SE-12, No. 2, pp. 211-221, Feb., 1986.

[3] Cooper E. C., "Replicated Procedure Call," ACM Operating Systems Review, Vol. 20, No. 1, pp. 44-55, Jan. 1986.

[4] Harel D. and Pnueli A., "On The Development of Reactive Systems," Weitzman Institute of Science, Rehovot, Israel, 1985.

[5] Knuth, D. E., The Art of Computer Programming, (Sorting and Searching, Volume 3), Addison-Wesley Publishing Company, Reading, Massachusetts, 1973.

[6] Lamport L., "Synchronizing Time Servers," SRC report No 18, DEC SRC, Palo Alto, CA, June, 1987.

[7] Levi S.-T. and Agrawala A. K., "Objects Architecture: A Comprehensive Design Approach for Real- Time, Distributed, Fault-Tolerant, Reactive Operating Systems," CS-TR-1915, Technical Report, De- partment of Computer Science, University of Maryland, College Park, Maryland, Sept., 1987.

[8] Levi S.-T. and Agrawala A. K., "Temporal Relations and Structures in Real-Time Operating Systems," CS-TR-1954, Technical Report, Department of Computer Science, University of Maryland, College Park, Maryland, Dec., 1987.

[9] Mancini L., "Modular Redundancy in a Message Passing System," IEEE Transactions on Software Engineering, Vol. SE-12, No. 1, pp. 79-86, Jan. 1986.

[10] Stankovic J. A. (editor), "Real-Time Computing Systems: The Next Generation," March 1987 CMU Workshop on Fundamental Issues in Distributed Real-Time Systems, Carnegie Mellon University, Pitts- burgh, Pennsylvania, Nov. 23, 1987.

105