my final seminar reprt

Upload: pradnya-mashale

Post on 07-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 My Final Seminar Reprt

    1/33

    FUTEX(Fast Userspace muTEX)

    1

    CHAPTER NO. 1

    INTRODUCTION

    In recent years, that is in past 5 years Linux has seen significant growth as a serveroperating system and has been successfully deployed as an enterprise for Web, file and print

    servicing. With the advent of Kernel Version 2.4, Linux has seen a tremendous boost in

    scalability and robustness which further makes it feasible to deploy even more demanding

    enterprise applications such as high end database, business intelligence software ,application

    servers, etc. As a result, whole enterprise business suites and middleware such as SAP,

    Websphere, Oracle, etc., are now available on Linux. For these enterprise applications to run

    efficiently on Linux, or on any other operating system, the OS must provide the proper

    abstractions and services. Usually these enterprise applications and applications suites or

    software are increasingly built as multi process / multithreaded applications. These application

    suites are often a collection of multiple independent subsystems.

    Despite functional variations between these applications often they require to

    communicate with each other and also sometimes they need to share a common state. Examples

    of this are database systems, which typically maintain shared I/O buffers in user space.Access to

    such shared state must be properly synchronized. Allowing multiple processes to access the same

    resources in a time sliced manner or potentially consecutively in the case of multiprocessor

    systems can cause many problems. This is due to the need to maintain data consistency, maintain

    true temporal dependencies and to ensure that each thread will properly release the resource as

    required when it has completed its action. Synchronization can be established through locks.

    There are mainly two types of locks: - Exclusive locks and shared locks. Exclusive locks are

    those which allows only a single user to access the protected entity, while shared locks are those

    which implements the multiple reader and single writer semantics. Synchronization implies a

    shared state, indicating that a particular resource is available or busy, and a means to wait for its

    availability. The latter one can either be accomplished through busy-waiting or through an

    explicit / implicit call to the scheduling.

  • 8/4/2019 My Final Seminar Reprt

    2/33

    FUTEX(Fast Userspace muTEX)

    2

    CHAPTER NO. 2

    CONCURRENCY IN LINUX OPERATING SYSTEMAs different processes interact with each other they may often need to access and

    modify shared section of code, memory locations and data. The section of code belonging to

    a process or thread which manipulates a variable which is also being manipulated by another

    process or thread is commonly called critical section. Proper synchronization problems

    usually serialize the access over critical section. Processes operate within their own virtual

    address space and are protected by the operating system from interference by other processes.

    By default a user process cannot communicate with another process unless it makes use of

    secure, kernel managed mechanisms. There are many times when processes will need to

    share common resources or synchronize their actions. One possibility is to use threads, which

    by definition can share memory within a process. This option is not always possible (or wise)

    due to the many disadvantages which can be experienced with threads. Methods of passing

    messages or data between processes are therefore required.

    In traditional UNIX systems the basic mechanisms for synchronization were

    System V IPC (inter process communication) such as semaphores, msgqueues, sockets and

    the file locking mechanisms such as flock() and fcntl() functions. Message queues

    (msgqueues) consist of a linked list within the kernel's addressing space. Messages are added

    to the queue sequentially and may be retrieved from the queue in several different ways.

    Semaphores are counters used to control access to shared resources by multiple processes.

    They are most often used as a locking mechanism to prevent processes from accessing a

    particular resource while another process is performing operations on it. Semaphores are

    implemented as sets, though a set may have a single member. Shared memory is a mapping

    of an area of memory into the address space of more than one process. This is the fastest

    form of IPC as processes do not subsequently need access to kernel services in order to share

    data. fcntl() locking implements locking directly via the kernel. This should work without

    problems on a local machine, but via NFS this requires locking support in the NFS server.The (old) user space nfsd does not support locking, while the kernel nfs server (knfsd), which

    comes with Linux 2.2 (and newer) supports locking. fcntl() locking will fail, if your NFS

    server doesn't support locking, so choosing the optimal locking technique does not only

    depend on the client machine but also on the NFS server machine.

  • 8/4/2019 My Final Seminar Reprt

    3/33

    FUTEX(Fast Userspace muTEX)

    3

    While checking out for semaphores, the Famous Physicist and Computer Scientist

    Edsger Wybe Dijkstra had proposed the concept of semaphores. Semaphores supports two

    basic operations UP and DOWN. The initial value of semaphore is taken 1.

    The codes that are given by Dijkstra on semaphores are as follows :-

    UP()

    {

    Semaphore++;

    }

    DOWN()

    {while(Semaphore==0);

    Semaphore=0;

    }

    Actual Linux implementation of the semaphores in the kernel involves UP and DOWN to be

    coded as system calls. The UP system call, after incrementing the value of semaphore sends a

    wakeup signal to all the processes which are waiting for the same semaphore value to be 1 so

    as they can acquire it. (Wake up implies that the process which does UP of semaphore also

    change the task_structures field to TASK_RUNNING). DOWN has a similar behavior

    proposed by Dijkstra. It simply sees whether the semaphore value is 1 else it sleeps within

    the while() loop.

    Mr. NUTT has provided the pseudo code as

    DOWN (s) : [while (s==0) {wait}; s=s-1; ]

    The code within square braces is indivisible and one in curly braces can be

    interrupted. Now to understand how these functions really achieve concurrency we choose an

    example where two processes A and B try to access the critical section. As the initial value of

    semaphore is one, when the two processes calls DOWN() concurrently, only one process will

    succeed in attaining the DOWN() operation. Say, suppose process A started executing

    DOWN() operation first. At this point, process B preempt A because DOWN() is a system

    call. Now process A sees that the value of semaphore is one and automatically decrements its

    value to zero and comes out of DOWN() without blocking. Now process B can preempt A

  • 8/4/2019 My Final Seminar Reprt

    4/33

    FUTEX(Fast Userspace muTEX)

    4

    because now A is off the system call. Now B starts executing DOWN() and checks the value

    of the semaphore which is zero at this moment and hence it goes to sleep and doesnt come

    out of the system call.

    When process A has completed, it generates a wakeup call and thus B starts

    executing. That was the whole story of Linux semaphores (These IPC mechanisms are coded

    as C routines given in /usr/src/linux/ipc. It is coded by an Indian programmer Mr. Krishna

    Balasubramaniam.)

    Thus we see these mechanisms expose an opaque handle to a kernel object that

    naturally provides the shared state and atomic operations in the kernel. Services must be

    requested through system calls (eg :- semop()). The drawback of this approach is that every

    lock access requires a system call. When locks have low contention rates, the system call canconstitute a significant overhead. A process may operate in one of two modes which are

    known as 'user' mode and 'system' mode (or kernel mode). A single process may switch

    between the two modes, i.e. they may be different phases of the same process. Processes

    defaulting to user mode include most application processes, these are executed within an

    isolated environment provided by the operating system such that multiple processes running

    on the same machine cannot interfere with each other's resources. A user processs switches to

    kernel mode when it makes a system call, generates an exception (fault) or when an interrupt

    occurs (e.g. system clock). At this point the kernel is executing on behalf of the process. At

    any one time during its execution a process runs in the context of itself and the kernel runs in

    the context of the currently running process.

    One solution to this problem is to deploy user level locking, which avoids some

    of the overhead associated with purely kernel-based locking mechanisms. It relies on a user

    level lock located in a shared memory region and modified through atomic operations to

    indicate the lock status. Only the contended case requires kernel intervention. The exact

    behavior and the obtainable performance are directly affected by how and when the kernel

    services are invoked, such as Shared memory region, Atomic Operations,etc.

    In this seminar, I am describing a particular fast user level locking mechanism

    called futexes, that was developed in the context of the Linux operating system. It consists of

    two parts, the user library and a kernel service that has been integrated into the Linux kernel

    distribution version 2.5.7.

  • 8/4/2019 My Final Seminar Reprt

    5/33

    FUTEX(Fast Userspace muTEX)

    5

    CHAPTER NO. 3

    LINUX USERSPACE LEVEL LOCKING

    3.1. HISTORY AND IMPLEMENTATION:

    Having stated the requirements in the previous section, we now proceed to describe

    the basic general implementation issues. As told before, futex is a fast user level locking

    mechanism and hence it is called as Fast Userspace muTEX. Futexes are very basic and lend

    themselves well for building higher level locking abstractions such as POSIX mutexes. Most

    programmers will in fact not be using futexes directly but instead rely on system libraries built

    on them, such as the NPTL pthreads implementation. A futex is identified by a piece of memory

    which can be shared between different processes. In fast userlevel locking, there are mainly twocases upon which it has been implemented. They are: The uncontended case and the contended

    case.

    The uncontended case should be efficient and should avoid system calls by all means.

    In the contended case we are willing to perform a system call to block in the kernel. Avoiding

    system calls in the uncontended case requires a shared state in user space accessible to all

    participating processes/task. This shared state, referred to as the user lock, indicates the status of

    the lock, i.e., whether the lock is held or not and whether there are waiting tasks or not. This is in

    contrast to the System V IPC mechanisms which merely exports a handle to the user, and

    performs all operations in the kernel. The user lock is located in a shared memory region that

    was create via shmat() or mmap().

    As a result, it can be located at different virtual addresses in different address spaces.

    In the uncontended case, the application atomically changes the lock status word without

    entering into the kernel. Hence, atomic operations such as atomic_inc(), atomic_dec(),

    cmpxchg(), and test_and_set() are neccessary in user space.

    In the contended case, the application needs to wait for the release of the lock or

    needs to wake up a waiting task in the case of an unlock operation. In order to wait in the kernel,

    a kernel object is required, that has waiting queues associated with it. The waiting queues

    provide the queueing and scheduling interactions. Of course, the aforementioned IPC

    mechanisms can be used for this purpose. However, these objects still imply a heavy weight

  • 8/4/2019 My Final Seminar Reprt

    6/33

    FUTEX(Fast Userspace muTEX)

    6

    object that requires a priori allocation and often does not precisely provide the required

    functionality. Another alternative that is commonly deployed are spinlocks where the task spins

    on the availability of the user lock until granted. To avoid too many cpu cycles being wasted, the

    task yields the processor occasionally.

    It is desirable to have the user lock be handlefree. In other words instead of handling

    an oqaque kernel handle, requiring initialization and cross process global handles, it is desirable

    to address locks directly through their virtual address. As a consequence, kernel objects can be

    allocated dynamically and on demand, rather than apriori. A lock, though addressed by a virtual

    address, can be identified conceptually through its global lock identity, which we define by the

    memory object backing the virtual address and the offset within that object. It is given by the

    tuple [B,O]. Since B represents the memory object backing the kernel object,it can be any of the

    three fundamental types.

    They are:

    (a) Anonymous memory,

    (b) Shared memory segment, and

    Memory mapped files.

    While (b) and can be used between multiple processes, (a) can only be used

    between threads of the same process. Utilizing the virtual address of the lock as the kernel handle

    also provides for an integrated access mechanism that ties the virtual address automatically with

    its kernel object. Despite the atomic manipulation of the user level lock word, race conditions

    can still exist as the sequence of lock word manipulation and system calls is not atomic. This has

    to be resolved properly within the kernel to avoid deadlock and improper functioning.

    Next the user lock object in futexes are described. For the purpose of this discussion

    a general opaque datatype ulock_t is defined to represent the userlevel lock. At a minimum it

    requires a status word :-

    typedef struct ulock_t {

    long status;

    } ulock_t;

    Here, it has got a long field which indicates the status of the user lock object. We

  • 8/4/2019 My Final Seminar Reprt

    7/33

    FUTEX(Fast Userspace muTEX)

    7

    assume that a shared memory region has been allocated either through shmat() or through

    mmap() and that any user locks are allocated into this region. Again, the addresses of the same

    lock need not be the same across all participating address spaces.

    The basic semaphore functions UP() and DOWN() can be implemented as follows:

    static inline int

    usema_down(ulock_t *ulock)

    {

    if (!__ulock_down(ulock))

    return 0;

    return sys_ulock_wait(ulock);

    }static inline int

    usema_up(ulock_t *ulock)

    {

    if (!__ulock_up(ulock))

    return 0;

    return sys_ulock_wakeup(ulock);

    }

    The ulock_down() and ulock_up() provide the atomic increment and decrement

    operations on the lock status word. A non positive count (status) indicates that the lock is not

    available. In addition, a negative count could indicate the number of waiting tasks in the kernel.

    If a contention is detected, i.e. (ulock->status

  • 8/4/2019 My Final Seminar Reprt

    8/33

    FUTEX(Fast Userspace muTEX)

    8

    3.2 PREVIOUS IMPLEMENTATIONS:

    One early design suggested was the explicit allocation of a kernel object and the

    export of the kernel object address as the handle. The kernel object was comprised of a wait

    queue and a unique security signature. On every wait or wakeup call, the signature would be

    verified to ensure that the handle passed indeed was referring to a valid kernel object. The

    disadvantages of this approach have been mentioned in section 2, namely that a handle needs to

    be stored in ulock_t and that explicit allocation and deallocation of the kernel object are required.

    Furthermore, security is limited to the length of the key and hypothetically could be guessed.

    Another prototype implementation, known as ulocks, implements general user semaphores with

    both fair and convoy avoidance wakeup policy.

    Mutual exclusive locks are regarded as a subset of the user semaphores. The

    prototype also provides multiple reader/single writer locks (rwlocks). The user lock object

    ulock_t consists of a lock word and an integer indicating the required number of kernel wait

    queues.

    User semaphores and exclusive locks are implemented with one kernel wait queue

    and multiple reader/single writer locks are implemented with two kernel wait queues. This

    implementation separates the lock word from the kernel wait queues and other kernel objects,

    i.e., the lock word is never accessed from the kernel on the time critical wait and wakeup code

    path. Hence the state of the lock and the number of waiting tasks in the kernel is all recorded in

    the lock word. For exclusive locks, standard counting as described in the general ulock_t

    discussion, is implemented. As with general semaphores, a positive number indicates the number

    of times the semaphore can be acquired, 0 and less indicates that the lock is busy, while the

    absolute of a negative number indicates the number of waiting tasks in the kernel.

    The premature wakeup call is handled by implementing the kernel internalwaitqueues using kernel semaphores (struct semaphore) which are initialized with a value 0. A

    premature wakeup call, i.e. no pending waiter yet, simply increases the kernel semaphores

    count to 1. Once the pending wait arrives it simply decrements the count back to 0 and exits the

    system call without waiting in the kernel. All the wait queues (kernel semaphores) associated

    with a user lock are encapsulated in a single kernel object. In the rwlocks case, the lock word is

  • 8/4/2019 My Final Seminar Reprt

    9/33

    FUTEX(Fast Userspace muTEX)

    9

    split into three fields: write locked (1 bit), writes waiting (15 bits), readers (16 bits). If write

    locked, the readers indicate the number of tasks waiting to read the lock, if not write locked, it

    indicates the numbers of tasks that have acquired read access to the lock.

    Writers are blocking on a first kernel wait queue, while readers are blocking on a

    second kernel wait queue associated with a ulock. To wakeup multiple pending read requests, the

    number of task to be woken up is passed through the system call interface. To implement

    rwlocks and ca-locks, atomic compare and exchange support is required. Unfortunately on older

    386 platforms that is not the case. The kernel routines must identify the kernel object that is

    associated with the user lock. Since the lock can be placed at different virtual addresses in

    different processes, a lookup has to be performed. In the common fast lookup, the virtual address

    of the user lock and the address space are hashed to a kernel object. If no hash entry exists, the

    proper global identity [B;O] of the lock must be established. For this we first scan the callingprocesss vma list for the vma containing the lock word and its offset.

    The global identity is then looked up in a second hash table that links global

    identities with their associated kernel object. If no kernel object exists for this global identity,

    one is allocated, initialized and added to the hash functions. The close() function associated with

    a shared region holding kernel objects is intercepted, so that kernel objects are deleted and the

    hash tables are cleaned up, once all attached processes have detached from the shared region.

    While this implementation provides for all the requirements, the kernel infrastructure of multiple

    hash tables and lookups was deemed too heavy. In addition, the requirement for compare and

    exchange is also seen to be restrictive.

  • 8/4/2019 My Final Seminar Reprt

    10/33

    FUTEX(Fast Userspace muTEX)

    10

    CHAPTER NO. 4

    FUTEXES

    4.1 WHAT IS FUTEX:

    The Linux kernel provides futexes ('Fast Userspace muTexes') as a building block for

    fast userspace locking and semaphores. Futexes were designed and worked on by Hubertus

    Franke IBM Thomas J. Watson Research Center, Matthew Kirkwood, Ingo Molnar (Red Hat)

    and Rusty Russell (IBM Linux Technology Center). Futexes are very basic and lend themselves

    well for building higher level locking abstractions such as POSIX mutexes. Initial futex support

    was merged in Linux 2.5.7 but with different semantics from those described below. Current

    semantics are available from Linux 2.5.40 onwards.

    A futex is identified by a piece of memory which can be shared between different

    processes. In these different processes, it need not have identical addresses. In its bare form, a

    futex has semaphore semantics; it is a counter that can be incremented and decremented

    atomically; processes can wait for the value to become positive. Futex operation is entirely

    userspace for the non-contended case. The kernel is only involved to arbitrate the contended

    case. As any sane design will strive for non-contension, futexes are also optimised for thissituation. In its bare form, a futex is an aligned integer which is only touched by atomic

    assembler instructions.

    Processes can share this integer over mmap(), via shared segments or because they

    share memory space, in which case the application is commonly called multithreaded. There are

    three key points of the original futex implementation which was added to the 2.5.7 kernel:

    1. We use a unique identifier for each futex (which can be shared across different address spaces,

    so may have different virtual addresses in each): this identifier is the struct page pointer and the

    offset within that page. We increment the reference count on the page so it cannot be swapped

    out while the process is sleeping.

    2. The structure indicating which futex the process is sleeping on is placed in a hash table, and is

    created upon entry to the futex syscalls on the processes kernel stack.

  • 8/4/2019 My Final Seminar Reprt

    11/33

    FUTEX(Fast Userspace muTEX)

    11

    3. The compression of Fast Userspace muTEX into futex gave a simple unique identifier to the

    section of code and the function names used.

    4.2 DESCRIPTION:

    The Linux kernel provides futexes ('Fast Userspace muTexes') as a building block

    for fast userspace locking and semaphores. Futexes are very basic and lend themselves well forbuilding higher level locking abstractions such as POSIX mutexes.

    This page does not set out to document all design decisions but restricts itself to

    issues relevant for application and library development. Most programmers will in fact not beusing futexes directly but instead rely on system libraries built on them, such as the NPTL

    pthreads implementation.

    A futex is identified by a piece of memory which can be shared between different processes. In

    these different processes, it need not have identical addresses. In its bare form, a futex hassemaphore semantics; it is a counter that can be incremented and decremented atomically;

    processes can wait for the value to become positive.

    Futex operation is entirely userspace for the non-contended case. The kernel isonly involved to arbitrate the contended case. As any sane design will strive for non-contension,

    futexes are also optimised for this situation.

    In its bare form, a futex is an aligned integer which is only touched by atomic assembler

    instructions. Processes can share this integer over mmap, via shared segments or because theyshare memory space, in which case the application is commonly called multithreaded.

    4.3 SEMANTICS:

    Any futex operation starts in userspace, but it may necessary to communicate with the

    kernel using the futex(2) system call.

    To 'up' a futex, execute the proper assembler instructions that will cause the host CPU to

    atomically increment the integer. Afterwards, check if it has in fact changed from 0 to 1, inwhich case there were no waiters and the operation is done. This is the non-contended case

    which is fast and should be common.

    In the contended case, the atomic increment changed the counter from -1 (or some other negative

    number). If this is detected, there are waiters. Userspace should now set the counter to 1 andinstruct the kernel to wake up any waiters using the FUTEX_WAKE operation.

  • 8/4/2019 My Final Seminar Reprt

    12/33

    FUTEX(Fast Userspace muTEX)

    12

    Waiting on a futex, to 'down' it, is the reverse operation. Atomically decrement the counter andcheck if it changed to 0, in which case the operation is done and the futex was uncontended. In

    all other circumstances, the process should set the counter to -1 and request that the kernel waitfor another process to up the futex. This is done using the FUTEX_WAIT operation.

    The futex system call can optionally be passed a timeout specifying how long thekernel should wait for the futex to be upped. In this case, semantics are more complex and the programmer is referred to futex(2) for more details. The same holds for asynchronous futex

    waiting.

    4.4 READ / WRITE LOCKS:

    We considered an implementation of FUTEX_ READ_DOWN et. al, which would be similar to

    the simple mutual exclusion locks, but before adding these to the kernel, Paul Mackerras

    suggested a design for creating read/write lock in userspace by using two futexes and a count: fast userspace read/write locks, or furwocks. This implementation provides the benchmark for anykernel-based implementation to beat to justify its inclusion as a first-class primitive, which can

    be done by adding new valid op values.

    4.5 OPERATIONS TO BE PERFORMED:

    y FUTEX_WAIT:This operation atomically verifies that the futex address uaddrstill contains

    the value val, and sleeps awaiting FUTEX_WAKE on this futex address. If the timeout

    argument is non-NULL, its contents describe the maximum duration of the wait, which isinfinite otherwise. The arguments uaddr2 and val3 are ignored.

    Example:

    For futex(7), this call is executed if decrementing the count gave a negative value

    (indicating contention), and will sleep until another process releases the futex and

    executes the FUTEX_WAKE operation.

    y FUTEX_WAKE:This operation wakes at most val processes waiting on this futex address

    (ie. inside FUTEX_WAIT). The arguments timeout, uaddr2 and val3 are ignored.

    Example:

  • 8/4/2019 My Final Seminar Reprt

    13/33

    FUTEX(Fast Userspace muTEX)

    13

    For futex(7), this is executed if incrementing the count showed that there were waiters,once the futex value has been set to 1 (indicating that it is available).

    y FUTEX_FDTo support asynchronous wakeups, this operation associates a filedescriptor with a futex. If another process executes a FUTEX_WAKE, the process will

    receive the signal number that was passed in val. The calling process must close thereturned file descriptor after use. The arguments timeout, uaddr2 and val3 are ignored.

    To prevent race conditions, the caller should test if the futex has been upped after

    FUTEX_FD returns.

    y FUTEX_REQUEUE (since Linux 2.5.70)This operation was introduced in order to avoid a "thundering herd" effect

    when FUTEX_WAKE is used and all processes woken up need to acquire another futex.

    This call wakes up val processes, and requeues all other waiters on the futex at addressuaddr2. The arguments timeoutand val3 are ignored.

    y FUTEX_CMP_REQUEUE (since Linux 2.6.7)There was a race in the intended use of FUTEX_REQUEUE, so

    FUTEX_CMP_REQUEUE was introduced. This is similar to FUTEX_REQUEUE, but

    first checks whether the location uaddr still contains the value val3. If not, an errorEAGAIN is returned. The argument timeoutis ignored.

    4.6 VALUES RETURNED BY THE ABOVE FUNCTIONS:

    y FUTEX_WAITReturns 0 if the process was woken by a FUTEX_WAKE call. In case of timeout,

    ETIMEDOUT is returned. If the futex was not equal to the expected value, the operation returnsEWOULDBLOCK. Signals (or other spurious wakeups) cause FUTEX_WAIT to return EINTR.

    y FUTEX_WAKEReturns the number of processes woken up.

    y FUTEX_FDReturns the new file descriptor associated with the futex.

  • 8/4/2019 My Final Seminar Reprt

    14/33

    FUTEX(Fast Userspace muTEX)

    14

    y FUTEX_REQUEUEReturns the number of processes woken up.

    y FUTEX_CMP_REQUEUEReturns the number of processes woken up.

    4.7 ERRORS:

    y EACCES:No read access to futex memory.

    y EAGAIN:FUTEX_CMP_REQUEUE found an unexpected futex value. (This probably indicates arace; use the safe FUTEX_WAKE now.)

    y EFAULT:Error in getting timeout information from userspace.

    y EINVAL :An operation was not defined or error in page alignment.

    y ENFILE:The system limit on the total number of open files has been reached

    4.8 VERSIONS:

    Initial futex support was merged in Linux 2.5.7 but with different semantics from those

    described above. Current semantics are available from Linux 2.5.40 onwards, while additionalfunctionality became available around 2.5.70 (FUTEX_REQUEUE) and 2.6.7(FUTEX_CMP_REQUEUE).

  • 8/4/2019 My Final Seminar Reprt

    15/33

    FUTEX(Fast Userspace muTEX)

    15

    CHAPTER NO.5

    MUTEXES AND SEMAPHORES

    5.1 WHAT ARE MUTEXES:

    Mutual exclusion (often abbreviated to mutex) algorithms are used in concurrent

    programming to avoid the simultaneous use of a common resource, such as a global variable, bypieces of computer code called critical sections. A critical section is a piece of code in which a

    process or thread accesses a common resource. The critical section by itself is not a mechanismor algorithm for mutual exclusion. A program, process, or thread can have the critical section in

    it without any mechanism or algorithm which implements mutual exclusion.

    Examples of such resources are fine-grained flags, counters or queues, used to communicate between code that runs concurrently, such as an application and its interrupt handlers. The

    synchronization of access to those resources is an acute problem because a thread can be stoppedor started at any time.

    To illustrate: suppose a section of code is altering a piece of data over several program steps, when another thread, perhaps triggered by some unpredictable event, starts

    executing. If this second thread reads from the same piece of data, the data, which is in theprocess of being overwritten, is in an inconsistent and unpredictable state. If the second thread

    tries overwriting that data, the ensuing state will probably be unrecoverable. These shared data

    being accessed by critical sections of code must, therefore, be protected, so that other processeswhich read from or write to the chunk of data are excluded from running.

    A mutex is also a common name for a program object that negotiates mutualexclusion among threads, also called a lock.

    5.2 DESCRIPTION:

    The purpose of mutexes is to let threads share resources safely. If two or more

    threads attempt to manipulate a data structure with no locking between them then the systemmay run for quite some time without apparent problems, but sooner or later the data structurewill become inconsistent and the application will start behaving strangely and is quite likely to

    crash. The same can apply even when manipulating a single variable or some other resource. Forexample, consider:

  • 8/4/2019 My Final Seminar Reprt

    16/33

    FUTEX(Fast Userspace muTEX)

    16

    static volatile int counter = 0;

    voidprocess_event(void)

    {

    counter++;

    }

    Assume that after a certain period of time counter has a value of 42, and two

    threads A and B running at the same priority call process_event(). Typically thread A will readthe value of counter into a register, increment this register to 43, and write this updated value

    back to memory. Thread B will do the same, so usually counter will end up with a value of 44.However if thread A is timesliced after reading the old value 42 but before writing back 43,

    thread B will still read back the old value and will also write back 43. The net result is that thecounter only gets incremented once, not twice, which depending on the application may prove

    disastrous.Sections of code like the above which involve manipulating shared data are

    generally known as critical regions. Code should claim a lock before entering a critical regionand release the lock when leaving. Mutexes provide an appropriate synchronization primitive for

    this.

    static volatile int counter = 0;static cyg_mutex_t lock;

    void

    process_event(void){

    cyg_mutex_lock(&lock);counter++;

    cyg_mutex_unlock(&lock);}

    A mutex must be initialized before it can be used, by calling cyg_mutex_init.This takes a pointer to a cyg_mutex_t data structure which is typically statically allocated, and

    may be part of a larger data structure. If a mutex is no longer required and there are no threadswaiting on it then cyg_mutex_destroy can be used.

  • 8/4/2019 My Final Seminar Reprt

    17/33

    FUTEX(Fast Userspace muTEX)

    17

    The main functions for using a mutex are cyg_mutex_lock andcyg_mutex_unlock. In normal operation cyg_mutex_lock will return success after claiming the

    mutex lock, blocking if another thread currently owns the mutex. However the lock operationmay fail if other code calls cyg_mutex_release or cyg_thread_release, so if these functions may

    get used then it is important to check the return value. The current owner of a mutex should call

    cyg_mutex_unlock when a lock is no longer required. This operation must be performed by theowner, not by another thread.

    cyg_mutex_trylock is a variant of cyg_mutex_lock that will always returnimmediately, returning success or failure as appropriate. This function is rarely useful. Typical

    code locks a mutex just before entering a critical region, so if the lock cannot be claimed thenthere may be nothing else for the current thread to do. Use of this function may also cause a form

    of priority inversion if the owner owner runs at a lower priority, because the priority inheritancecode will not be triggered. Instead the current thread continues running, preventing the owner

    from getting any cpu time, completing the critical region, and releasing the mutex.

    cyg_mutex_release can be used to wake up all threads that are currently blocked inside a call tocyg_mutex_lock for a specific mutex. These lock calls will return failure. The current mutex

    owner is not affected.

    5.3 RECURRSIVE MUTEXES:The implementation of mutexes within the eCos kernel does not support recursive

    locks. If a thread has locked a mutex and then attempts to lock the mutex again, typically as a

    result of some recursive call in a complicated call graph, then either an assertion failure will be

    reported or the thread will deadlock. This behaviour is deliberate. When a thread has just lockeda mutex associated with some data structure, it can assume that that data structure is in aconsistent state. Before unlocking the mutex again it must ensure that the data structure is again

    in a consistent state.

    Recursive mutexes allow a thread to make arbitrary changes to a data structure, then

    in a recursive call lock the mutex again while the data structure is still inconsistent. The net resultis that code can no longer make any assumptions about data structure consistency, which defeats

    the purpose of using mutexes.

    5.4 WHAT ARE SEMAPHORES:In computer science, a semaphore is a protected variable or abstract data type that

    provides a simple but useful abstraction for controlling access by multiple processes to acommon resource in a parallel programming environment.

  • 8/4/2019 My Final Seminar Reprt

    18/33

    FUTEX(Fast Userspace muTEX)

    18

    A useful way to think of a semaphore is as a record of how many units of a particular resourceare available, coupled with operations to safely (i.e. without race conditions) adjust that record as

    units are required or become free, and if necessary wait until a unit of the resource becomesavailable.

    Semaphores are a useful tool in the prevention of race conditions and deadlocks;however, their use is by no means a guarantee that a program is free from these problems.Semaphores which allow an arbitrary resource count are called counting semaphores, whilst

    semaphores which are restricted to the values 0 and 1 (or locked/unlocked, unavailable/available)are called binary semaphores.

    The semaphore concept was invented by Dutch computer scientist Edsger Dijkstra, and the

    concept has found widespread use in a variety of operating systems.

    5.5 SYSTEMS OPERATED ON SEMAPHORES:1. Semaphore line, a system of long-distance communication based on towers with moving

    arms.

    2. Flag semaphore systems.3. Railway semaphore signals for railway traffic control.1. Semaphore Line:

    A semaphore telegraph, optical telegraph, shutter telegraph chain, Chappe

    telegraph, orNapoleonic semaphore is a system of conveying information by means of visual

    signals, using towers with pivoting shutters, also known as blades or paddles. Information isencoded by the position of the mechanical elements; it is read when the shutter is in a fixedposition. These systems were popular in the late 18th to early 19th century. In modern usage,

    "semaphore line" and "optical telegraph" may refer to a relay system using flag semaphore.

    Semaphore lines were a precursor of the electrical telegraph. They were far faster thanpost riders for bringing a message over long distances, but far more expensive and less private

    than the electrical telegraph lines which would replace them. The distance that an opticaltelegraph can bridge is limited by geography and weather; thus, in practical use, most optical

    telegraphs used lines of relay stations to bridge longer distances.

    2. Semaphore Flags:The Semaphore Flags is the system for conveying information at a distance by means

    of visual signals with hand-held flags, rods, disks, paddles, or occasionally bare or glovedhands. Information is encoded by the position of the flags; it is read when the flag is in a fixed

    position. Semaphores were adopted and widely used (with hand-held flags replacing themechanical arms of shutter semaphores) in the maritime world in the early 19th century.

    Semaphore signals were used, for example, at the Battle of Trafalgar. This was the period in

  • 8/4/2019 My Final Seminar Reprt

    19/33

    FUTEX(Fast Userspace muTEX)

    19

    which the modern naval semaphore system was invented. This system uses hand-held flags. It isstill used during underway replenishment at sea and is acceptable for emergency

    communication in daylight or, using lighted wands instead of flags, at night.

    3. Railway Semaphore Signals:One of the earliest forms of fixed railway signal is the semaphore. These signals display theirdifferent indications to train drivers by changing the angle of inclination of a pivoted 'arm'.

    Semaphore signals were patented in the early 1840s by Joseph James Stevens, and soon becamethe most widely-used form of mechanical signal. Designs have altered over the intervening

    years, and colour light signals have replaced semaphore signals in some countries, but in othersthey remain in use.

    5.6 MUTEXES Vs SEMAPHORES:y Mutexes:

    "Mutexes are typically used to serialise access to a section of re-entrant code thatcannot be executed concurrently by more than one thread. A mutex object only allows one thread

    into a controlled section, forcing other threads which attempt to gain access to that section towait until the first thread has exited from that section."

    y Semaphores:"A semaphore restricts the number of simultaneous users of a shared resource up to

    a maximum number. Threads can request access to the resource (decrementing the semaphore),and can signal that they have finished using the resource (incrementing the semaphore)."

  • 8/4/2019 My Final Seminar Reprt

    20/33

    FUTEX(Fast Userspace muTEX)

    20

    CHAPTER NO. 6

    POSIX

    6.1 WHAT IS POSIX:

    "Portable Operating System Interface [for Unix] is the name of a family of relatedstandards specified by the IEEE to define the application programming interface (API), along

    with shell and utilities interfaces for software compatible with variants of the Unix operatingsystem, although the standard can apply to any operating system.

    Originally, the name "POSIX" referred to IEEE Std 1003.1-1988, released, as the name suggests,in 1988. The family of POSIX standards is formally designated as IEEE 1003 and the

    international standard name is ISO/IEC 9945.

    The standards, formerly known as IEEE-IX, emerged from a project that began circa 1985.Richard Stallman suggested the name POSIX in response to an IEEE request for a memorable

    name.

    The POSIX specifications for Unix-like operating system environments originallyconsisted of a single document for the core programming interface, but eventually grew to 17

    separate documents.The standardized user command line and scripting interface were based onthe Korn shell. Many user-level programs, services, and utilities including awk, echo, ed were

    also standardized, along with required program-level services including basic I/O (file, terminal,and network) services. POSIX also defines a standard threading library API which is supported

    by most modern operating systems.

    6.2 SCRIPTING INTERFACE:

    A command-line interface (CLI) is a mechanism for interacting with a computeroperating system or software by typing commands to perform specific tasks. This text-only

    interface contrasts with the use of a mouse pointer with a graphical user interface (GUI) to clickon options, or menus on a text user interface (TUI) to select options. This method of instructing a

    computer to perform a given task is referred to as "entering" a command: the system waits for theuser to conclude the submitting of the text command by pressing the "Enter" key (a descendant

    of the "carriage return" key of a typewriter keyboard). A command-line interpreter then receives,parses, and executes the requested user command. The command-line interpreter may be run in a

    text terminal or in a terminal emulator window as a remote shell client such as PuTTY. Uponcompletion, the command usually returns output to the user in the form of text lines on the CLI.

  • 8/4/2019 My Final Seminar Reprt

    21/33

    FUTEX(Fast Userspace muTEX)

    21

    6.3 KORN SHELL:The Korn shell (ksh) is a Unix shell which was developed by David Korn (AT&T

    Bell Laboratories) in the early 1980s and announced at Toronto USENIX on July 14 1983.Otherearly contributors were AT&T Bell Labs developers Mike Veach, who wrote the emacs code,

    and Pat Sullivan, who wrote the vi code. ksh is backwards-compatible with the Bourne shell andincludes many features of the C shell as well, such as a command history, which was inspired by

    the requests of Bell Labs users.

    6.4 BOURNE SHELL:The Bourne shell, or sh, was the default Unix shell of Unix Version 7, and

    replaced the Thompson shell, whose executable file had the same name, sh. It was developed by

    Stephen Bourne, of AT&T Bell Laboratories, and was released in 1977 in the Version 7 Unix

    release distributed to colleges and universities. It remains a popular default shell for Unixaccounts. The binary program of the Bourne shell or a compatible program is located at /bin/shon most Unix systems, and is still the default shell for the root superuser on many current Unix

    implementations.

    Its command interpreter contained all the features that are commonly considered

    to produce structured programs. Although it is used as an interactive command interpreter, it wasalways intended as a scripting language. It gained popularity with the publication of The UNIX

    Programming Environment by Brian W. Kernighan and Rob Pike. This was the firstcommercially published book that presented the shell as a programming language in a tutorial

    form.

  • 8/4/2019 My Final Seminar Reprt

    22/33

    FUTEX(Fast Userspace muTEX)

    22

    CHAPTER NO. 7

    THE 2.5.7 IMPLEMENTATION

    7.1 THE IMPLEMENTATION OF FUTEX:

    The initial implementation which was judged a sufficient basis for kernel inclusionused a single two-argument system call, sys_futex(struct futex *, int op). The first argument was

    the address of the futex, and the second was the operation, used to further demultiplex the systemcall and insulate the implementation somewhat from the problems of system call number

    allocation. The latter is especially important as the system call is expand as new operations arerequired. The sys_futex system call provides a method for a program to wait for a value at a

    given address to change, and a method to wake up anyone waiting on a particular address whilethe addresses for the same memory in separate processes may not be equal, the kernel maps them

    internally so the same memory mapped in different locations will correspond for sys_futex calls.

    The two valid op numbers for this implementation were FUTEX_UP and

    FUTEX_DOWN. The algorithm was simple, the file linux/kernel/futex.c containing 140 codelines, and 233 in total.

    The algorithm implemented is as given below:

    1. The user address was checked for alignment and that it did not overlap a page boundary.

    2. The page is pinned: this involves looking up the address in the processs address space to find

    the appropriate struct page *, and incrementing its reference count so it cannot be swapped out.

    3. The struct page * and offset within the page are added, and that result hashed using the

    recently introduced fast multiplicative hashing routines to give a hash bucket in the futex hashtable.

    4. The op argument is then examined. If it is FUTEX_DOWN then:

    (a) The process is marked INTERRUPTIBLE, meaning it is ready to sleep.

    (b) A struct futex_q is chained to the tail of the hash bucket determined in step 3: the tail ischosen to give FIFO ordering for wakeups. This structures contains a pointer to the process andthe struct page * and offset which identify the futex uniquely.

    (c) The page is mapped into low memory (if it is a high memory page), and an atomic decrement

    of the futex address is attempted,4 then unmapped again. If this does not decrement the counterto zero, we check for signals (setting the error to EINTR and going to the next step), schedule,

    and then repeat this step.

  • 8/4/2019 My Final Seminar Reprt

    23/33

    FUTEX(Fast Userspace muTEX)

    23

    (d) Otherwise, we now have the futex, or have received a signal, so we mark this process

    RUNNING, unlink ourselves from the hash table, and wake the next waiter if there is one, andreturn 0 or -EINTR. We have to wake another process so that it decrements the futex to -1 to

    indicate that it is waiting (in the case where we have the futex), or to avoid the race where a

    signal came in just as we were woken up to get the futex (in the case where a signal wasreceived).

    4. If the op argument was FUTEX_UP:(a)Map the page into low memory if it is in a high memory page(b)Set the count of the futex to one (available).

    Unmap the page if it was mapped from high memory

    (d) Search the hash table for the first struct futex_q associated with this futex, and wake up thatprocess.

    6. Otherwise, if the op argument is anything else, set the error to EINVAL.

    7. Unpin the page.While there are several subtleties in this implementation, it gives a second major advantage over

    System V semaphores: there are no explicit limits on how many futexes you can create, nor can

    one futex user starve other users of futexes.

    This is because the futex is merely a memory location like any other until the sys_futex syscall

    is entered, and each process can only do one sys_futex syscall at a time, so we are limited topinning one page per process into memory, at worst.

    7.2 PROBLEMS AND MODIFICATIONS IN 2.5.7 IMPLEMENTATION:

    y PROBLEMS:1. Once the first implementation entered the mainstream experimental kernel, it drew the

    attention of a much wider audience. In particular those concerned with implementing POSIXthreads, and attention also returned to the fairness issue.

    2. There is no straightforward way to implement the pthread_cond_timedwait primitive: thisoperation requires a timeout, but using a timer is difficult as these must not interfere with

    their use by any other code.

  • 8/4/2019 My Final Seminar Reprt

    24/33

    FUTEX(Fast Userspace muTEX)

    24

    3. The pthread_cond_broadcast primitive requires every process sleeping to be woken up,which does not fit well with the 2.5.7 implementation, where a process only exits the kernel

    when the futex has been successfully obtained or a signal is received.4. For N:M threading, such as the Next Generation Posix Threads project [5] an asynchronous

    interface for finding out about the futex is required, since a single process (containing

    multiple threads) might be interested in more than one futex.5. Starvation occurs in the following situation: a single process which immediately drops andthen immediately competes for the lock will regain it before any woken process will.

    y MODIFICATIONS:With these limitations brought to light, we searched for another design which would be

    flexible enough to cater for these diverse needs. After various implemenation attempts anddiscussions we settled on a variation of atomic_compare_and_swap primitive, with the atomicity

    guaranteed by passing the expected value into the kernel for checking. With these modifications,futex has been implemented in the Linux kernel version 2.6.

    To do this, two new op values namely, FUTEX_WAIT and FUTEX_WAKE replaced theoperations above, and the system call was changed to two additional arguments, int val and struct

    timespec *reltime.

    FUTEX_WAIT: This operation atomically verifies that the futex address still contains the valuegiven, and sleeps awaiting

    FUTEX_WAKE on this futex address. If the timeout argument is non-NULL, its contents

    describe the maximum duration of the wait, which is infinite otherwise. It is similar to the previous FUTEX_ DOWN, except that the looping and manipulation of the counter is left to

    userspace.

    This works as follows:

    1. Set the process state to INTERRUPTIBLE, and place struct futex_q into the hash table as

    before.

    2. Map the page into low memory (if in high memory).

    3. Read the futex value.

    4. Unmap the page (if mapped at step 2).

    5. If the value read at step 3 is not equal to the val argument provided to the system call, set the

    return to EWOULDBLOCK.

  • 8/4/2019 My Final Seminar Reprt

    25/33

    FUTEX(Fast Userspace muTEX)

    25

    6. Otherwise, sleep for the time indicated by the reltime argument, or indefinitely if that is

    NULL.(a) If we timed out, set the return value to ETIMEDOUT.

    (b) Otherwise, if there is a signal pending, set the return value to EINTR.

    7. Try to remove our struct futex_q from the hash table: if we were already removed, return 0(success) unconditionally, as this means we were woken up, otherwise return the error code

    specified above.

    FUTEX_WAKE: This is similar to the previous FUTEX_UP, except that it does not alter thefutex value, it simple wakes one (or more) processes. The number of processes to wake is

    controlled by the int val parameter, and the return value for the system call is the number of processes actually woken and removed from the hash table. It returns the number of

    processes woken up.

    FUTEX_AWAIT: This is proposed as an asynchronous operation to notify the process via aSIGIO-style mechanism when the value changes. The exact method has not yet been settled(see future work in Section 5). This new primitive is only slightly slower than the previous

    one,6 in that the time between waking the process and that process attempting to claim thelock has increased (as the lock claim is done in userspace on return from the FUTEX_WAKE

    syscall), and if the process has to attempt the lock multiple times before success, eachattempt will be accompanied by a syscall, rather than the syscall claiming the lock itself. On

    the other hand, the following can be implemented entirely in the userspace library:

    1. All the POSIX style locks, including pthread_cond_broadcast (which requires the wakeall operation) and pthread_cond_timedwait (which requires the timeout argument). One

    of the authors (Rusty) has implemented a nonpthreads demonstration library which doesexactly this.

    2. Read-write locks in a single word, on architectures which support cmpxchg-style

    primitives.

    3. FIFO wakeup, where fairness is guaranteed to anyone waiting. Finally, it is worthwhile

    pointing out that the kernel implementation requires exactly the same number of lines as theprevious implementation.

    FUTEX_FD: It is used to support asynchronous wakeups. This operation associates a file

    descriptor with a futex. If another process executes a FUTEX_WAKE, the process willreceive the signal number that was passed in val. The calling process must close the returned

    file descriptor after use. To prevent race conditions, the caller should test if the futex hasbeen upped after FUTEX_FD returns. It returns the new file descriptor associated with the

    futex.

  • 8/4/2019 My Final Seminar Reprt

    26/33

    FUTEX(Fast Userspace muTEX)

    26

    CHAPTER NO. 8

    ADVANTAGES, DIS-ADVANTAGES AND REQUIREMENTS

    y ADVANTAGES:1. Futexes makes the operating system fast and maintains quick access.2. User Locking is the special feature of the futexes which helps in avoidance of the

    interruption of the other users working on the same system.

    3. Large Kernel space which helps multiple users to operate on the System.

    y DIS-ADVANTAGES:1. Block waiting to be notified for the lock to be released.2. Yield the processor until the thread is rescheduled and then the lock is tried to be

    acquired again

    3. Spin consuming CPU cycles until the lock is released.

    y REQUIREMENTS:There are various behavioral requirements that need to be considered. Most center

    around the fairness of the locking scheme and the lock release policy. In a fair locking scheme

    the lock is granted in the order it was requested, i.e., it is handed over to the longest waiting task.

    This can have negative impact on throughput due to the increased number of context switches.At the same time it can lead to the so called convoy problem. Since, the locks are granted in theorder of request arrival, they all proceed at the speed of the slowest process, slowing down all

    waiting processes. A common solution to the convoy problem has been to mark the lockavailable upon release, wake all waiting processes and have them recontend for the lock.

    This is referred to as random fairness, although higher priority tasks will usually

    have an advantage over lower priority ones. However, this also leads to the well known

  • 8/4/2019 My Final Seminar Reprt

    27/33

    FUTEX(Fast Userspace muTEX)

    27

    thundering herd problem. Despite this, it can work quite well on uni-processor systems if the firsttask to wake releases the lock before being preempted or scheduled, allowing the second herd

    member to obtain the lock, etc. It works less spectacularly on Symmetric Multi ProcessingSystems (SMP). To avoid this problem, one should only wake up one waiting task upon lock

    release. Marking the lock available as part of releasing it, gives the releasing task the opportunity

    to reacquire the lock immediately again, if so desired, and avoid unnecessary context switchesand the convoy problem. Some refer to these as greedy, as the running task has the highestprobability of reacquiring the lock if the lock is hot.

    However, this can lead to starvation. Hence, the basic mechanisms must enable both

    fair locking, random locking and greedy or convoy avoidance locking (short ca-locking).Random fairness:

    Wake up 1 process

    Another requirement is to enable spin locking, i.e., have an application spin for the availablilty of

    the lock for some user specified time (or until granted) before giving up and resolving to block inthe kernel for its availability. Spin-locks are useful for short-critical sections. If you cannot avoidhaving a long critical section, you should not use spin-lock primitives in the first place, but use

    blocking synchronization primitives. Hence an application has the choice to either

    A. Block waiting to be notified for the lock to be released

    B. Yield the processor until the thread is rescheduled and then the lock is tried to be acquired

    again

    C. Spin consuming CPU cycles until the lock is released.

    Thus with respect to performance, there are basically two overriding goals:1. Avoid system calls if possible, as system calls typically consume several hundred

    instructions.

    2. Avoid unnecessary context switches: context switches lead to overhead associated with TLBinvalidations.

    3. Each time a process is removed from access to the processor, sufficient information on its

    current operating state must be stored such that when it is again scheduled to run on theprocessor it can resume its operation from an identical position.

    This operational state data is known as its context and the act of removing theprocess's thread of execution from the processor (and replacing it with another) is known as a

    process switch or context switch.

  • 8/4/2019 My Final Seminar Reprt

    28/33

    FUTEX(Fast Userspace muTEX)

    28

    A typical process switch or context switch involves the following:

    Another requirement is that fast user level locking should be simple enough to provide the basic

    foundation to efficiently enable more complicated synchronization constructs, e.g. semaphores,

    rwlocks, blocking locks, or spin versions of these, pthread mutexes, DB latches..It should also allow for a clean separation of the blocking requirements towards the kernel, sothat the blocking only has to be implemented with a small set of different constructs.

    This allows for extending the use of the basic primitives without kernel

    modifications. Of interest is the implementation of mutex, semaphores and multiple reader/singlewriter locks.

    Finally, a solution needs to be found that enables the recovery of dead locks. Deadlock is a

    permanent blocking of a set of processes that either compute for system resources orcommunicate with each other. Deadlock may be addressed by mutual exclusion or by deadlock

    avoidance.

    Mutual exclusion prevents two threads accessing the same resource simultaneously.

    Deadlock avoidance can include initiation denial or allocation denial, both of which serve toeliminate the state required for deadlock before it arises. We define unrecoverable locks as those

    that have been acquired by a process and the process terminates without releasing the lock. Thereare no convenient means for the kernel or for the other processes to determine which locks are

    currently held by a particular process, as lock acquisition can be achieved through user memorymanipulation.

    Each process has some form of associated Process Identifier, (PID) through which it

    may be manipulated. The process also carries the User Identifier (UID) of the person whoinitiated the process and will also have group identifier (GID). Registering the processs pid

    after lock acquisition is not enough as both operations are not atomic. If the process dies before itcan register its pid or if it cleared its pid and before being able the release the lock, the lock is

    unrecoverable.

  • 8/4/2019 My Final Seminar Reprt

    29/33

    FUTEX(Fast Userspace muTEX)

    29

    CHAPTER NO. 9

    FUTURE WORK

    A lot of time is being wasted between a process releasing the lock and another

    process acquiring the lock. Inorder to avoid that, an asynchronous wait extension will be addedto consume less time.

    A Currently, in the uncontended case, only the system calls are avoided. It would bemuch more good if the kernel interactions are also removed. For this purpose, they are working

    with the NGPT team to build Global POSIX mutexes over futex. NGPT supports a M : Nthreading model, i.e., M user level threads are executed over N tasks. Conceptually, the N tasks

    provide virtual processors on which the M user threads are executing. When a user level thread,executing on one of these N tasks, needs to block on a futex, it should not block the task, as this

    task provides the virtual processing. Instead only the user thread should be descheduled by thethread manager of the NGPT system.

    Nevertheless, a waitobj must be attached to the waitqueue in the kernel, indicating thata user thread is waiting on a particular futex and that the task group needs a notification with

    respect to the continuation on that futex. Once the thread manager receives the notification it canreschedule the previously blocked user thread. For this we provide an additional operator

    AFUTEX_WAIT to the sys_futex system call. Its task is to append a waitobj to the futexskernel waitqueue and continue. This waitobj cannot be allocated on the stack and must be

    allocated and de-allocated dynamically.

    Dynamic allocations have the disadvantage that the waitobjs must be freed evenduring an irregular program exit. It further poses a denial of service attack threat in that a

    malicious applications can continuously call sys_futex (AFUTEX_WAIT).

    The general solutions seem to convert to the usage of a /dev/futex device to control resourceconsumption. The first solution is to allocate a file descriptor fd from the /dev/futex device for

    each outstanding asynchronous waitobj. Conveniently these descriptors should be pooled toavoid the constant opening and closing of the device. The private data of the file would simply

    be the waitobj. Upon completion a SIGIO is sent to the application.

    The advantage of this approach is that the denial of service attack is naturally limited

    to the file limits imposed on a process. Furthermore, on program death, all waitobjs stillenqueued can be easily dequeued. The disadvantage is that this approach can significantly

    pollute the fd space. Another solution pro-posed has been to open only one fd, but allowmultiple waitobj allocations for this fd.

    This approach removes the fd space pollution issue but requires an additional tuning

    parameter for how many outstanding waitobjs should be allowed per fd. It also requires properresource management of the waitobjs in the kernel. At this point no definite decisions has been

    reached on which direction to proceed. The question of priorities in futexes has been raised: thecurrent implementation is strictly FIFO order. The use of nice level is almost certainly too

  • 8/4/2019 My Final Seminar Reprt

    30/33

    FUTEX(Fast Userspace muTEX)

    30

    restrictive, so some other priority method would be required. Expanding the sys-tem call to add apriority argument is possible if there were demonstrated application advantage.

  • 8/4/2019 My Final Seminar Reprt

    31/33

    FUTEX(Fast Userspace muTEX)

    31

    CONCLUSION

    In this paper, we described a fast userlevel locking mechanism, called futexes, that were

    integrated into the Linux 2.5 development kernel and also in the kernel version 2.6 of Linux.

    Also the various requirements for such a package,previous various solutions and the currentfutex package were outlined. In the performance futexes can provide significant performance

    advantages over standard System V IPC semaphores in all case studies.

  • 8/4/2019 My Final Seminar Reprt

    32/33

    FUTEX(Fast Userspace muTEX)

    32

    REFERENCES:

    y www.google.comy www.linuxforu.comy Lse.sourceforge.comy Oss.software.ibm.com

  • 8/4/2019 My Final Seminar Reprt

    33/33

    FUTEX(Fast Userspace muTEX)