w4118 operating systems instructor: junfeng yang

W4118 Operating Systems

Instructor: Junfeng Yang

Logistics

Homework 2 update Assembly code to call a hook function written in C

• syscall_fail(long syscall_nr)

Clarifications on sys_fail (int ith, int ncall, struct syscall_failures calls)

• Only fail system calls from the current process• Each fail() injects only one failure• if a system call matches one of the system call

numbers specified in the calls argument, count it as one matching call toward ith

Mac users: talk to TA to get access to VMware fusion

Last lecture

Processes in Linux Context switch on x86 Kernel stack captures all state switch stack =

switch process

Threads: good at expressing concurrency efficiently Multithreading Models: different tradeoffs Race conditions

Recall: banking exampleint balance = 1000;int main(){ pthread_t t1, t2; pthread_create(&t1, NULL, deposit, (void*)1); pthread_create(&t2, NULL, withdraw, (void*)2); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“all done: balance = %d\n”, balance); return 0;}void* deposit(void *arg)

{ int i; for(i=0; i<1e7; ++i) ++ balance;}

void* withdraw(void *arg){ int i; for(i=0; i<1e7; ++i) -- balance;}

Recall: a closer look at the banking example

$ objdump –d bank…08048464 <deposit>:… // ++ balance8048473: a1 80 97 04 08 mov 0x8049780,%eax8048478: 83 c0 01 add $0x1,%eax804847b: a3 80 97 04 08 mov %eax,0x8049780…

0804849b <withdraw>:… // -- balance80484aa: a1 80 97 04 08 mov 0x8049780,%eax80484af: 83 e8 01 sub $0x1,%eax80484b2: a3 80 97 04 08 mov %eax,0x8049780…

Avoiding Race Conditions

Race condition: a timing dependent error involving shared state

Critical section: a segment of code that accesses shared variable (or resource) and must not be concurrently executed by more than one thread

// ++ balancemov 0x8049780,%eaxadd $0x1,%eaxmov %eax,0x8049780…

// -- balancemov 0x8049780,%eaxsub $0x1,%eaxmov %eax,0x8049780…

How to implement critical sections?

Atomic operations: no other instructions can be interleaved, executed “as a unit” “all or none”, guaranteed by hardware

A possible solution: create a super instruction that does what we want atomically add $0x1, 0x8049780

Problem Can’t anticipate every

possible way we want atomicity

Increases hardware complexity, slows down other instructions

// ++ balancemov 0x8049780,%eaxadd $0x1,%eaxmov %eax,0x8049780…

// -- balancemov 0x8049780,%eaxsub $0x1,%eaxmov %eax,0x8049780…

Layered approach to synchronization

Hardware-provided low-level atomic operations

High-level synchronization primitives

Properly synchronized application

Hardware provides simple low-level atomic operations, upon which we can build high-level, atomic operations, upon which we can implement critical sections and build correct multi-threaded/multi-process programs

Example low-level atomic operations and high-level synchronization

primitives Low-level atomic operations

On uniprocessor, disable/enable interrupt x86 load and store of words Special instructions:

• Test-and-set

High-level synchronization primitives Lock Semaphore Monitor Will look at them all.

Start with lock.

Locks (or Mutex: Mutual exclusion) Two common operations

lock(): acquire lock exclusively; wait if not available

unlock(): release exclusive access to lock pthread example

void* deposit(void *arg){ int i; for(i=0; i<1e7; ++i) {

pthread_mutex_lock(&l); ++ balance;

pthread_mutex_unlock(&l); }}

void* withdraw(void *arg){ int i; for(i=0; i<1e7; ++i) { pthread_mutex_lock(&l); -- balance; pthread_mutex_unlock(&l); }}

pthread_mutex_t l = PTHREAD_MUTEX_INITIALIZER

Requirements Safety (aka mutual exclusion): no more than one thread in critical

section at a time. Liveness (aka progress):

• If multiple threads simultaneously request to enter critical section, must allow one to proceed

• Must not depend on threads outside critical section Bounded waiting (aka starvation-free)

• Must eventually allow waiting thread to proceed Makes no assumptions about the speed and number of CPU

• However, assumes each thread makes progress

Desirable properties Efficient: don’t consume too much resource while waiting

• Don’t busy wait (spin wait). Better to relinquish CPU and let other thread run

Fair: don’t make some thread wait longer than others. Hard to do efficiently

Simple: should be easy to use

Critical Section Goals

Implementing Locks: version 1

Can cheat on uniprocessor: implement locks by disabling and enabling interrupts

Linux kernel heavily used this trick in single core days cli(): __asm__ __volatile__("cli": : :"memory") sti(): __asm__ __volatile__(“sti": : :"memory")

Good: simple! Bad:

Both operations are privileged, can’t let user program use

Doesn’t work on multiprocessors

lock(){ disable_interrupt();}

unlock(){ enable_interrupt();}

Implementing Locks: version 2

Peterson’s algorithm: software-based lock implementation

Good: doesn’t require much from hardware Only assumptions:

Loads and stores are atomic They execute in order Does not require special hardware instructions

Software-based lock: 1st attempt

Idea: use one flag, test then set, if unavailable, spin-wait (or busy-wait)

Why doesn’t work? Not safe: both threads can be in critical section Not efficient: busy wait, particularly bad on uniprocessor

(will solve this later)

lock(){ while (flag == 1)

; // spin wait flag = 1;}

unlock(){ flag = 0;}

// 0: lock is available, 1: lock is held by a threadint flag = 0;

Software-based locks: 2nd attempt

Idea: use per thread flags, set then test, to achieve mutual exclusion

Why doesn’t work? Not live: can deadlock

lock(){ flag[self] = 1; // I need lock while (flag[1- self] == 1)

; // spin wait}

unlock(){ // not any more flag[self] = 0;}

// 1: a thread wants to enter critical section, 0: it doesn’tint flag[2] = {0, 0};

Software-based locks: 3rd attempt

Idea: strict alternation to achieve mutual exclusion

Why doesn’t work? Not live: depends on threads outside critical section

lock(){ // wait for my turn while (turn == 1 – self)

; // spin wait}

unlock(){ // I’m done. your turn turn = 1 – self;}

// whose turn is it?int turn = 0;

Software-based locks: final attempt (Peterson’s algorithm)

Why works? Safe? Live? Bounded wait?

// whose turn is it?int turn = 0;// 1: a thread wants to enter critical section, 0: it doesn’tint flag[2] = {0, 0};

lock(){ flag[self] = 1; // I need lock turn = 1 – self; // wait for my turn while (flag[1-self] == 1

&& turn == 1 – self); // spin wait while the

// other thread has intent // AND it is the other // thread’s turn}

unlock(){ // not any more flag[self] = 0;}

Notes on Peterson’s algorithm

Great way to start thinking of multi-threaded programming Scheduler is “malicious”

The algorithm is useful in other contexts as well

Problem: Doesn’t work with N>2 threads

• Obvious extension: N flags, turn = 0,1,…,N-1 doesn’t work

• Leslie Lamport’s Bakery’s algorithm More importantly, doesn’t really work on modern

out-of-order processors

Next: implement locks with hardware support

Implementing locks: version 3

Problem with the test-then-set approach: test and set are not atomic

Special atomic operation test_and_set address, register load *address to register, and set *address to 1

Why works?

lock(){ while(test_and_set(&flag)) ;}

unlock(){ flag = 0;}

// 0: lock is available, 1: lock is held by a threadint flag = 0;

test_and_set on x86

xchg reg, addr: atomically swaps *addr and reg spin_lock in Linux can be implemented using this

instruction (include/asm-i386/spin_lock.h) Another way: append a “lock prefix” before an instruction

Examples in include/asm-i386/atomic.h

long test_and_set(volatile long* lock){ int old; asm("xchgl %0, %1" : "=r"(old), "+m"(*lock) // output : "0"(1) // input : "memory“ // can clobber anything in // memory, so gcc won’t // reorder this statement with others ); return old;}

Spin-wait or block

So far the lock implementations we’ve seen are busy-wait or spin-wait locks: endlessly checking the lock flag without yielding CPU

Problem: waste CPU cycles Worst case: prev thread holding a busy-wait lock gets

preempted, other threads try to acquire the same lock On uniprocessor: should not use spin-lock

Yield CPU when lock not available (need OS support) On multi-processor

Thread holding lock gets preempted ??? Correct action depends on how long before lock

release• Lock released “quickly” spin-wait• Lock released “slowly” block

Quick or slow is relative to context-switch overhead Good plan: spin a bit, then block

Problem with simple yield

Problem: Still a lot of context switches Starvation possible

Why? No control over who gets the lock next Need explicit control over who gets the lock

lock(){ while(test_and_set(&flag)) yield();}

Implementing locks: version 4

Add thread to queue when lock unavailable

typedef struct __mutex_t { int flag; // 0: mutex is available, 1: mutex is not available int guard; // guard lock to internal mutex data structure queue_t *q; // queue of waiting threads} mutex_t;

void lock(mutex_t *m) { while (test_and_set(m->guard)) ; //acquire guard lock by spinning if (m->flag == 0) { m->flag = 1; // acquire mutex m->guard = 0; } else { enqueue(m->q, self); m->guard = 0; yield(); }}

void unlock(mutex_t *m) { while (test_and_set(m->guard)) ; if (queue_empty(m->q)) // release mutex; no one wants mutex m->flag = 0; else // hold mutex (for next thread!) wakeup(dequeue(m->q)); m->guard = 0;}

Semaphore Synchronization tool that does not require busy

waiting Semaphore S – integer variable Two standard operations modify S: acquire() and

release() Originally called P() and V()

• from Dutch Proberen and Verhogen (Dijkstra) Also called down() and up() And even wait() and signal()

Higher-level abstraction, less complicated Can only be accessed via two indivisible (atomic)

operations P(S) { while(S ≤ 0) ; S--;}

V(S) { S++;}

Semaphore Types

Counting semaphore – integer value can range over an unrestricted domain Used for synchronization

Binary semaphore – integer value can range only between 0 and 1; can be simpler to implement Used for mutual exclusion: same as mutex

Process i

P(S);

Critical Section

V(S);

Remainder Section

Semaphore Implementation Must guarantee that no two processes can

execute P () / acquire () and V () / release () on the same semaphore at the same time

Thus, implementation of these operations becomes the critical section problem again, where the acquire and release code are placed inside the critical section. Could now have busy waiting in critical section

implementation But if we know we can’t acquire semaphore,

should we busy wait and burn up the CPU? Note that applications may spend lots of time in

critical sections and therefore this is not a good solution.

We’d like a semaphore that sleeps (or at least lets someone else run)

Semaphore Implementation with no Busy waiting

With each semaphore there is an associated waiting queue. Each entry in a waiting queue has two data items: value (of type integer) pointer to next record in the list

Two operations: block – place the process invoking the operation

on the appropriate waiting queue. wakeup – remove one of processes in the waiting

queue and place it in the ready queue. Potential queuing policies: FIFO, LIFO, undef

Semaphore Implementation with no Busy waiting (Cont.)

Implementation of acquire():

Implementation of release():

Producer-Consumer Problem

Bounded buffer: size ‘N’ Access entry 0… N-1, then “wrap around” to 0 again

Producer process writes data to buffer Must not write more than ‘N’ items more than consumer

“ate”

Consumer process reads data from buffer Should not try to consume if there is no data

0 1

In Out

N-1

Solving Producer-Consumer Problem

Solving with semaphores We’ll use two kinds of semaphores We’ll use counters to track how much data is in the

buffer• One counter counts as we add data and stops the

producer if there are N objects in the buffer• A second counter counts as we remove data and stops

a consumer if there are 0 in the buffer Idea: since general semaphores can count for us, we

don’t need a separate counter variable Why do we need a second kind of semaphore?

We’ll also need a mutex semaphore

Producer-Consumer ProblemShared: Semaphores mutex, empty, full;

Init: mutex = 1; /* for mutual exclusion*/ empty = N; /* number empty buf entries */ full = 0; /* number full buf entries */

Producer

do { . . . // produce an item in nextp . . . P(empty); P(mutex); . . . // add nextp to buffer . . . V(mutex); V(full);} while (true);

Consumer

do { P(full); P(mutex); . . . // remove item to nextc . . . V(mutex); V(empty); . . . // consume item in nextc . . . } while (true);

Common Errors with Semaphores

Process i

P(S)CSP(S)

Process j

V(S)CSV(S)

Process k

P(S)CS

A typo. Process I will get stuck (forever) the second time it does the P() operation. Moreover, every other process will freeze up too when trying

to enter the critical section!

A typo. Process J won’t respect mutual exclusion even if the other

processes follow the rules correctly. Worse still, once we’ve done two

“extra” V() operations this way, other processes might get into the CS

inappropriately!

Whoever next calls P() will freeze up. The bug might be confusing

because that other process could be perfectly correct code, yet that’s the one you’ll see hung when you use the debugger to look at its state!

Process l

P(S)If (something) return;CSV(S)

Someone forgot to release the semaphore before returning! The

next caller will get stuck.

w4118 operating systems instructor: junfeng yang

Documents