operating systems course review & extensions. distributed systems fault tolerance © 2011, d. j....
Post on 21-Dec-2015
221 views
TRANSCRIPT
Operating Systems
Course Review & Extensions
2
Distributed Systems
Fault Tolerance
© 2011, D. J. Foreman
3
Byzantine Generals ProblemByzantine failures = arbitrary failures
■ Crashes, incorrect results, etcA problem for fault-tolerant dist. systemsSystem Rules:
a. All loyal generals apply same action plan
b. Small # of traitors cannot force a bad plan
c. Every system must receive the same info■ N=total # of systems■ T=# “traitors” (i.e.; failing systems)
Basic solution requires N>3T (i.e.;N>=3T+1)© 2011, D. J. Foreman
4
The dilemma
© 2011, D. J. Foreman
P1
P2 P3
11
0
5
The dilemma – pt 2
© 2011, D. J. Foreman
P2 P3
10
1
P1
Who sent the bad message, P1 or P3????
6
Application of the algorithmTwo stages
1. All Pi send messages to n-2 other Pi
Not back to msg originator
2. All Pi decide on an action
3. There are T rounds of msgs
For N processes, Pi sends N-1 messages in Round 1,
then (N-1)*(N-2) in Round 2,
and (N-1)*(N-2)*(N-3) in Round 3, etc.
© 2011, D. J. Foreman
7
Distributed Systems
Mutual Exclusion Algorithms
© 2011, D. J. Foreman
8
Timestamp mechanisms
© 2011, D. J. Foreman
9
Lamport’s Algorithm -1 Assumptions:
1. Request R from Pi is time-stamped (Ti,i) where Ti=Ci which is Pi’s CPU time
2. Pi has a request queue RQi ordered by >=
[see Algorithm on next slide] Verification:
1. Rule 3b & the assumption that R are received in order guarantees that Pi has learned about all requests preceding R
2. Since >= totally orders Rn rule 3a provides mutual exclusion
N-1 requests, N-1 replies, N-1 releases© 2011, D. J. Foreman
10
Lamport’s Algorithm - pt 2 Rules:
1. Pi puts Req on RQi & sends Req to all Pi
2. When Pj gets Req, puts it on RQj & acks
3. Pi is allowed access when a&b are true:a. Pi’s own Req is at the front of RQi
b. Pi has received a Req with Tj>Ti
4. To release resource, Pi pulls RQi sends time-stamped RELEASE to all other Pj
5. When Pj receives the RELEASE, Pj pulls R from its own RQj
© 2011, D. J. Foreman
11
Ricart & Agrawala’s AlgorithmMore efficient than Lamport’s algorithm
(needs only 2(N-1) messages)Rules:
1. Pi puts Req on RQi & sends Req to all Pi
2. When Pj gets Reqa. If Pj is not also requesting Req, Pi acks
b. If Pj IS also requesting Req, and (Tj,j)<(Ti,i), keep (Ti,i) else Reply to Pi
3. When Pi gets Reply from all Pn Req granted
4. When Pi releases Req, send Reply for all pending Req’s© 2011, D. J. Foreman
12
Locking
© 2011, D. J. Foreman
13
Locking MechanismsImplies need for structuring of transactionsConstraints are requiredRequest rules for transactions, T:
■ Exclusive access • granted only if no other T has ANY type of lock on
the object
■ Shared lock• Grant if no other T has an Exclusive lock
© 2011, D. J. Foreman
14
Transactions Well-formed
■ Reads only if it has a shared or exclusive lock■ Writes only if it has an exclusive lock
Two phase■ Does not request a lock after releasing a lock
Strong two phase■ All unlocks are done at the end of T
© 2011, D. J. Foreman
15
Basics
Mutual Exclusion
© 2011, D. J. Foreman
16
Atomic actions“Appear” as if done in parallel
■ “Could” be interruptible■ Places where done:
• Hardware – machine state switching• Kernel code –
– Semaphores, mutexes, condvars– Machine state switching
• Library code – when library manages switching– Semaphores, mutexes, condvars– Thread state switching
© 2011, D. J. Foreman
17
ThreadsTwo types
■ Library-supported• Atomic actions occur inside the library functions• May use kernel-supported atomic actions• May be supplied with system or added on
(Linux+pthreads vs. Windows+pthreads)• Thread blocking is dependent on library design
■ Kernel-supported• Atomic actions occur inside kernel code• Thread blocking done by kernel
© 2011, D. J. Foreman
18
Mutual ExclusionMechanism for critical section safetySemaphores
■ Binary■ Counting■ Any thread can signal
Mutexes■ Only locker can unlock
Monitors■ Use condition variables and mutexes■ Like a “class” in C++/Java
© 2011, D. J. Foreman
19
Addresses & pointersPointer specifies a memory address
■ “Could” be a virtual address (when is it not?)■ Must be translated to a “real” address■ What is a pointer inside the kernel?■ How does the kernel access user space?
© 2011, D. J. Foreman
20
O/S Work Flow1. Initialize then create a “main process”
2. Display a user interface
3. Wait for an interrupt. 2 cases:a. CPU is idle \NO instructions are processing
b. A waiting process is allowed to run
4. h/w applies interrupt voltage
5. Processor switches to handler in kernel-mode
6. Interrupt is handled
7. Scheduler is called, then either:a. Wait for an interrupt (go back to 3a)
b. Resume a ready process (from 3b above)
© 2011, D. J. Foreman
21
Linux architecture
© 2011, D. J. Foreman
App 1 App 2 App n
System Call Interface
Kernel Subsystemse.g.; I/O
Device drivers
Kernelspace
Hardware
22
Interrupt handling -1Interrupt handlers are asynchronous
■ May interrupt other interrupt handlers■ May run with current interrupt-level disabled■ May run with all interrupts disabled■ May be timing dependent ■ Must be fast■ MUST NOT BLOCK!!!
Divided into 2 parts■ “top halves” (Interrupt handler)■ “bottom halves” (leftover code from top half)
© 2011, D. J. Foreman
23
Interrupt handling -2Top halves
■ Asynchronous ■ Ack receipt (“talks” to h/w)■ Copies data to/from h/w■ Non-interruptible ■ MUST BE short & FAST!
Bottom halves■ Deferred to “later” (i.e.; when system is not busy)
■ Interrupts enabled■ May be long, slow
© 2011, D. J. Foreman
24
Interrupt handling -3Original design:
■ 32 BH’s■ Main Int Hdlr sets a bit to get one called■ No extensibility■ Globally sync’d (cannot run 2 at same time)
New design (2.6 kernels)■ Introduced use of queues■ Softirq’s & tasklets ■ Replace the BH mechanism
• but the work is still deferred, still called BH’s
© 2011, D. J. Foreman
25
System DesignsWhy are these good/bad ideas?
■ Pageable kernel memory (found in some Unix’s, not in Linux)
■ Monolithic static kernels (Unix, VM, MVS)■ Interruptible interrupt handlers
© 2011, D. J. Foreman
26
How Kernel differs from user appsNo C library calls allowed (why?)GNU C or newer Intel compilers onlyISO C99 extensions allowedHard to use floating point
■ fp mode switch in PC’sSmall fixed-size stack on PC’s
(8KB or 16KB for 32- or 64-bit machine)Synch is a major concern (why?)Portability is a design point
© 2011, D. J. Foreman
27
Synchronization (in the kernel)Must support SMP
(Symmetric MultiProcessing)Interrupts are asynchronous Parts of the kernel are preemptable
(interruptable)
Kernel must protect itself AND users
© 2011, D. J. Foreman
28
Kernel threads(threads that run in & for the kernel)
No address space (MM pointer=null)No context switch to user-space to runSchedulable!Interruptible!E.g.;
■ pdflush (dirty-page write-back)■ Ksoftirqd (the kernel soft IRQ daemon)■ (see next page for details)
© 2011, D. J. Foreman
29
Kernel thread examples & notespdflush (dirty page write-back)
■ Free RAM drops below a threshold■ Dirty data grows older than a threshold■ Page-writes are queued ■ Handled when threshold is passed
ksoftirqd (the soft IRQ daemon)■ Queuing TCPIP packets■ Handled after hard IRQ’s and at sched.c■ NOT preemptable!!!
© 2011, D. J. Foreman
30
Page Replacement Algorithms
© 2011, D. J. Foreman
31
The ClockInit: Create a circular list of frames,
set ptr to newest
Do_page_fault(){ptr=ptr->next
If (no criterion used) victim found // ≈ FIFO
Else if (Referenced==0) thenIf (Dirty==1) schedule for cleaning
Else {victim found} // ≈ LRU
}
© 2011, D. J. Foreman
32
WSClockIf (R==1)
LR[f]=Process CPU Time
R=0
Else //not ref’d in last cycleif PT-LR[f]<T {victim found} // page is older than T
*T is working set window size (units=time)
LR is array of Last Reference timesNote: LRU needs hw to set time at each reference
© 2011, D. J. Foreman
33
Paging: Direct mapping
© 2011, D. J. Foreman
Start of page table
Computed (virtual) address in user space
Page Table
F
+ Page bits Offset bitsPage # (p)
P
Frame bits
Offset bits
Computed (real) address
F=pagetable[P]
34
Paging: associative table
© 2011, D. J. Foreman
Computed (virtual) address in user space
Page bits Offset bitsPage # (p)
Virt. Page #(VPN)
Frame #
P FFrame bits
Offset bits
Computed (real) address
All compares are simultaneous F ≠pagetable[P] F=match(VPN(p))
35
Paging-inverted table
© 2011, D. J. Foreman© 2011, D. J. Foreman 35
Start of page table
Computed (virtual) address in user space
Page Table
F
+ hash(p) Page bits Offset bitsPage # (p)
Frame bits
Offset bits
Computed (real) address
F=pagetable[hash(P)]
36
Paging effects
© 2011, D. J. Foreman
# frames allocated to Pi
Inter-fault time
Total execution time of Pi
Number of Pages in PiAllocating more frames to Pi
increases i.f. time (less paging occurs for Pi)
37
Linux page replacement Variant of Clock algorithm 2 linked lists
■ Active pages (A) (referenced recently)• Never used as victims
■ Inactive pages (I)■ Most recently used at head of each list
New page-> inactive, marked Ref’d MM checks all pages, if Ref’d, R=1 If inactive & R=1 already,
■ move to head(A)■ R=0
Periodic check to move from A to I
© 2011, D. J. Foreman