operating systems course review & extensions. distributed systems fault tolerance © 2011, d. j....

Operating Systems

Course Review & Extensions

2

Distributed Systems

Fault Tolerance

© 2011, D. J. Foreman

3

Byzantine Generals ProblemByzantine failures = arbitrary failures

■ Crashes, incorrect results, etcA problem for fault-tolerant dist. systemsSystem Rules:

a. All loyal generals apply same action plan

b. Small # of traitors cannot force a bad plan

c. Every system must receive the same info■ N=total # of systems■ T=# “traitors” (i.e.; failing systems)

Basic solution requires N>3T (i.e.;N>=3T+1)© 2011, D. J. Foreman

4

The dilemma


P1

P2 P3

11

0

5

The dilemma – pt 2


P2 P3

10

1

P1

Who sent the bad message, P1 or P3????

6

Application of the algorithmTwo stages

1. All Pi send messages to n-2 other Pi

Not back to msg originator

2. All Pi decide on an action

3. There are T rounds of msgs

For N processes, Pi sends N-1 messages in Round 1,

then (N-1)*(N-2) in Round 2,

and (N-1)*(N-2)*(N-3) in Round 3, etc.


7

Distributed Systems

Mutual Exclusion Algorithms


8

Timestamp mechanisms


9

Lamport’s Algorithm -1 Assumptions:

1. Request R from Pi is time-stamped (Ti,i) where Ti=Ci which is Pi’s CPU time

2. Pi has a request queue RQi ordered by >=

[see Algorithm on next slide] Verification:

1. Rule 3b & the assumption that R are received in order guarantees that Pi has learned about all requests preceding R

2. Since >= totally orders Rn rule 3a provides mutual exclusion

N-1 requests, N-1 replies, N-1 releases© 2011, D. J. Foreman

10

Lamport’s Algorithm - pt 2 Rules:

1. Pi puts Req on RQi & sends Req to all Pi

2. When Pj gets Req, puts it on RQj & acks

3. Pi is allowed access when a&b are true:a. Pi’s own Req is at the front of RQi

b. Pi has received a Req with Tj>Ti

4. To release resource, Pi pulls RQi sends time-stamped RELEASE to all other Pj

5. When Pj receives the RELEASE, Pj pulls R from its own RQj


11

Ricart & Agrawala’s AlgorithmMore efficient than Lamport’s algorithm

(needs only 2(N-1) messages)Rules:

1. Pi puts Req on RQi & sends Req to all Pi

2. When Pj gets Reqa. If Pj is not also requesting Req, Pi acks

b. If Pj IS also requesting Req, and (Tj,j)<(Ti,i), keep (Ti,i) else Reply to Pi

3. When Pi gets Reply from all Pn Req granted

4. When Pi releases Req, send Reply for all pending Req’s© 2011, D. J. Foreman

12

Locking


13

Locking MechanismsImplies need for structuring of transactionsConstraints are requiredRequest rules for transactions, T:

■ Exclusive access • granted only if no other T has ANY type of lock on

the object

■ Shared lock• Grant if no other T has an Exclusive lock


14

Transactions Well-formed

■ Reads only if it has a shared or exclusive lock■ Writes only if it has an exclusive lock

Two phase■ Does not request a lock after releasing a lock

Strong two phase■ All unlocks are done at the end of T


15

Basics

Mutual Exclusion


16

Atomic actions“Appear” as if done in parallel

■ “Could” be interruptible■ Places where done:

• Hardware – machine state switching• Kernel code –

– Semaphores, mutexes, condvars– Machine state switching

• Library code – when library manages switching– Semaphores, mutexes, condvars– Thread state switching


17

ThreadsTwo types

■ Library-supported• Atomic actions occur inside the library functions• May use kernel-supported atomic actions• May be supplied with system or added on

(Linux+pthreads vs. Windows+pthreads)• Thread blocking is dependent on library design

■ Kernel-supported• Atomic actions occur inside kernel code• Thread blocking done by kernel


18

Mutual ExclusionMechanism for critical section safetySemaphores

■ Binary■ Counting■ Any thread can signal

Mutexes■ Only locker can unlock

Monitors■ Use condition variables and mutexes■ Like a “class” in C++/Java


19

Addresses & pointersPointer specifies a memory address

■ “Could” be a virtual address (when is it not?)■ Must be translated to a “real” address■ What is a pointer inside the kernel?■ How does the kernel access user space?


20

O/S Work Flow1. Initialize then create a “main process”

2. Display a user interface

3. Wait for an interrupt. 2 cases:a. CPU is idle \NO instructions are processing

b. A waiting process is allowed to run

4. h/w applies interrupt voltage

5. Processor switches to handler in kernel-mode

6. Interrupt is handled

7. Scheduler is called, then either:a. Wait for an interrupt (go back to 3a)

b. Resume a ready process (from 3b above)


21

Linux architecture


App 1 App 2 App n

System Call Interface

Kernel Subsystemse.g.; I/O

Device drivers

Kernelspace

Hardware

22

Interrupt handling -1Interrupt handlers are asynchronous

■ May interrupt other interrupt handlers■ May run with current interrupt-level disabled■ May run with all interrupts disabled■ May be timing dependent ■ Must be fast■ MUST NOT BLOCK!!!

Divided into 2 parts■ “top halves” (Interrupt handler)■ “bottom halves” (leftover code from top half)


23

Interrupt handling -2Top halves

■ Asynchronous ■ Ack receipt (“talks” to h/w)■ Copies data to/from h/w■ Non-interruptible ■ MUST BE short & FAST!

Bottom halves■ Deferred to “later” (i.e.; when system is not busy)

■ Interrupts enabled■ May be long, slow


24

Interrupt handling -3Original design:

■ 32 BH’s■ Main Int Hdlr sets a bit to get one called■ No extensibility■ Globally sync’d (cannot run 2 at same time)

New design (2.6 kernels)■ Introduced use of queues■ Softirq’s & tasklets ■ Replace the BH mechanism

• but the work is still deferred, still called BH’s


25

System DesignsWhy are these good/bad ideas?

■ Pageable kernel memory (found in some Unix’s, not in Linux)

■ Monolithic static kernels (Unix, VM, MVS)■ Interruptible interrupt handlers


26

How Kernel differs from user appsNo C library calls allowed (why?)GNU C or newer Intel compilers onlyISO C99 extensions allowedHard to use floating point

■ fp mode switch in PC’sSmall fixed-size stack on PC’s

(8KB or 16KB for 32- or 64-bit machine)Synch is a major concern (why?)Portability is a design point


27

Synchronization (in the kernel)Must support SMP

(Symmetric MultiProcessing)Interrupts are asynchronous Parts of the kernel are preemptable

(interruptable)

Kernel must protect itself AND users


28

Kernel threads(threads that run in & for the kernel)

No address space (MM pointer=null)No context switch to user-space to runSchedulable!Interruptible!E.g.;

■ pdflush (dirty-page write-back)■ Ksoftirqd (the kernel soft IRQ daemon)■ (see next page for details)


29

Kernel thread examples & notespdflush (dirty page write-back)

■ Free RAM drops below a threshold■ Dirty data grows older than a threshold■ Page-writes are queued ■ Handled when threshold is passed

ksoftirqd (the soft IRQ daemon)■ Queuing TCPIP packets■ Handled after hard IRQ’s and at sched.c■ NOT preemptable!!!


30

Page Replacement Algorithms


31

The ClockInit: Create a circular list of frames,

set ptr to newest

Do_page_fault(){ptr=ptr->next

If (no criterion used) victim found // ≈ FIFO

Else if (Referenced==0) thenIf (Dirty==1) schedule for cleaning

Else {victim found} // ≈ LRU

}


32

WSClockIf (R==1)

LR[f]=Process CPU Time

R=0

Else //not ref’d in last cycleif PT-LR[f]<T {victim found} // page is older than T

*T is working set window size (units=time)

LR is array of Last Reference timesNote: LRU needs hw to set time at each reference


33

Paging: Direct mapping


Start of page table

Computed (virtual) address in user space

Page Table

F

+ Page bits Offset bitsPage # (p)

P

Frame bits

Offset bits

Computed (real) address

F=pagetable[P]

34

Paging: associative table



Page bits Offset bitsPage # (p)

Virt. Page #(VPN)

Frame #

P FFrame bits

Offset bits


All compares are simultaneous F ≠pagetable[P] F=match(VPN(p))

35

Paging-inverted table

© 2011, D. J. Foreman© 2011, D. J. Foreman 35

Start of page table


Page Table

F

+ hash(p) Page bits Offset bitsPage # (p)

Frame bits

Offset bits


F=pagetable[hash(P)]

36

Paging effects


# frames allocated to Pi

Inter-fault time

Total execution time of Pi

Number of Pages in PiAllocating more frames to Pi

increases i.f. time (less paging occurs for Pi)

37

Linux page replacement Variant of Clock algorithm 2 linked lists

■ Active pages (A) (referenced recently)• Never used as victims

■ Inactive pages (I)■ Most recently used at head of each list

New page-> inactive, marked Ref’d MM checks all pages, if Ref’d, R=1 If inactive & R=1 already,

■ move to head(A)■ R=0

Periodic check to move from A to I


operating systems course review & extensions. distributed systems fault tolerance © 2011, d. j....

Documents