fault tolerant distributed computing system
DESCRIPTION
Fault Tolerant Distributed Computing system. Fundamentals. What is fault? A fault is a blemish, weakness, or shortcoming of a particular hardware or software component. Fault, error and failures Why fault tolerant ? Availability, reliability, dependability, … - PowerPoint PPT PresentationTRANSCRIPT
Fault Tolerant Distributed Computing system.
What is fault? A fault is a blemish, weakness, or shortcoming
of a particular hardware or software component.
Fault, error and failures
Why fault tolerant? Availability, reliability, dependability, …
How to provide fault tolerance ? Replication Checkpointing and message logging Hybrid
Fundamentals
Message Logging
Tolerate crash failures Each process periodically records its local
state and log messages received after Once a crashed process recovers, its state must
be consistent with the states of other processesOrphan processes
• surviving processes whose states are inconsistent with the recovered state of a crashed process
Message Logging protocols guarantee that upon recovery no processes are orphan processes
Message logging protocols
Pessimistic Message Logging• avoid creation of orphans during execution• no process p sends a message m until it knows
that all messages delivered before sending m are logged; quick recovery
• Can block a process for each message it receives - slows down throughput
• allows processes to communicate only from recoverable states; synchronously log to stable storage any information that may be needed for recovery before allowing process to communicate
Message Logging
Optimistic Message Logging• take appropriate actions during recovery to
eliminate all orphans• Better performance during failure-free runs• allows processes to communicate from non-
recoverable states; failures may cause these states to be permanently unrecoverable, forcing rollback of any process that depends on such states
Causal Message Logging
Causal Message Logging• no orphans when failures happen and do not block
processes when failures do not occur.• Weaken condition imposed by pessimistic protocols• Allow possibility that the state from which a process
communicates is unrecoverable because of a failure, but only if it does not affect consistency.
• Append to all communication information needed to recover state from which communication originates - this is replicated in memory of processes that causally depend on the originating state.
KAN – A Reliable Distributed Object System
Developed at UC Santa BarbaraProject Goal:
Language support for parallelism and distribution
Transparent location/migration/replication Optimized method invocation Fault-tolerance Composition and proof reuse
System Description
Kan Compiler
Kan source
Java bytecode + Kan run-time libraries
JVM JVMJVM
UNIX sockets
Fault Tolerance in Kan
Log-based forward recovery scheme: Log of recovery information for a node is
maintained externally on other nodes. The failed nodes are recovered to their pre-
failure states, and the correct nodes keep their states at the time of the failures.
Only consider node crash failures. Processor stops taking steps and failures are
eventually detected.
Basic Architecture of the Fault Tolerance Scheme
Logical Node yLogical Node x
Fault Detector Failure handler
Request handler
Communication Layer
IP Address
Network
External Log
Physical Node i
Logical Ring
Use logical ring to minimize the need for global synchronization and recovery.
The ring is only used for logging (remote method invocations).
Two parts: Static part containing the active correct nodes. It has a
leader and a sense of direction: upstream and downstream. Dynamic part containing nodes that trying to join the ring
A logical node is logged at the next T physical nodes in the ring, where T is the maximum number of nodes failures to tolerate.
Logical Ring Maintenance
Each node participating in the protocol maintains a variables: Failedi(j): true if i has detected the failure of j
Mapi(x): the physical node on which logical node x resides
Leaderi: i’s view of the leader of the ring
Viewi: i’s view of the logical ring (membership and order)
Pendingi: the set of physical nodes that i suspects of failing
Recovery_counti: the number of logical nodes that need to be recovered
Readyi: records whether I is active. Initial set of ready nodes; new nodes become ready when they are
linked into the ring.
Failure Handling
When node i is informed of failure of node j: If every node upstream of i has failed, then I must become new
leader. It remaps all logical nodes from the upstream physical nodes, and informs the other correct nodes by sending a remap message. It then recovers the logical nodes.
If the leader has failed but there is some upstream node k that will become the new leader, then just update the map and leader variables to reflect the new situation
If the failed node j is upstream of i, then just update map. If I is the next downstream node from j, also recover the logical nodes from j.
If j is downstream of i and there is some node k downstream of j, then just update map.
If j is downstream of I and there is no node downstream of j, then wait for the leader to update map.
If i is the leader and must recover j, then change map, send a remap message to change the correct nodes’ maps, and recover all logical nodes that are mapped locally
Physical Node and Leader Recovery
When a physical node comes back up: It sends a join message to the leader. The leader tries to link this node in the ring:
Acquire <-> Grant Add, Ack_add Release
When the leader fails, the next downstream node in the ring becomes the new leader.
AQuA
Adaptive Quality of Service AvailabilityDeveloped in UIUC and BBN.Goal:
Allow distributed applications to request and obtain a desired level of availability.
Fault tolerance replication reliable messaging
Features of AQuA
Uses the QuO runtime to process and make availability requests.
Proteus dependability manager to configure the system in response to faults and availability requests.
Ensemble to provide group communication services.
Provide CORBA interface to application objects using the AQuA gateway.
Proteus functionality
How to provide fault tolerance for appl.Style of replication (active, passive)voting algorithm to usedegree of replicationtype of faults to tolerate (crash, value or time)location of replicas
How to implement chosen ft schemedynamic configuration modificationstart/kill replicas, activate/deactivate
monitors,voters
Group structure
For reliable mcast and pt-to-pt. CommReplication groupsConnection groupsProteus Communication Service Group for
replicated proteus manager• replicas and objects that communicate with the
manager• e.g. notification of view change, new QuO request• ensure that all replica managers receive same info
Point-to-point groups• proteus manager to object factory
AQuA Architecture
Fault Model, detection and Handling
Object Fault Model: Object crash failure - occurs when object stops sending
out messages; internal state is lost• crash failure of an object is due to the crash of at lease one
element composing the objectValue faults - message arrives in time with wrong content
(caused by application or QuO runtime)• Detected by voter
Time faults• Detected by monitor
Leaders report fault to Proteus; Proteus will kill objects with fault if necessary, and generate new objects
AQuA Gateway Structure
Egida
Developed in UT, AustinAn object-oriented, extensible toolkit
for low-overhead fault-toleranceProvides a library of objects that can
be used to compose log-based rollback recovery protocols.
Specification language to express arbitrary rollback-recovery protocols
Log-based Rollback Recovery
Checkpointing• independent, coordinated, induced by specific
patterns of communication
Message Logging• Pessimistic, optimistic, causal
Core Building Blocks
Almost all the log-based rollback recovery protocols share event-driven structures
The common events are: Non-deterministic events
Orphans, determinant
Dependency-generating events Output-commit events Checkpointing events Failure-detection events
A grammar for specifying rollback-recovery protocols
Protocol := <non-det-event-stmt>* <output-commit-event-stmt>* <dep-gen-event-stmt> <ckpt-stmt>op t <recovery-stmt>op t
<non-det-event-stmt> := <event> : determinant : <determinant-structure> <Log <event-info-list> <how-to-log> on <stable-
storage>>opt
<output-commit-event-stmt> := <output-commit-proto> output commit on < event-list>
<event> := send | receive | read | write<determinant-structure> := {source, sesn, dest, dest}<output-commit-proto> := independent | co-ordinated<how-to-log> := synchronously | asynchronously<stable-storage> := local disk | volatile memory of self
Egida Modules
EventHandler Determinant HowToOutputCommit LogEventDeterminant LogEventInfo HowToLog WhereToLog StableStorage VolatileStorage Checkpointing …