enhancing the fault-tolerance of nonmasking programs
Post on 31-Dec-2015
17 Views
Preview:
DESCRIPTION
TRANSCRIPT
Enhancing The Fault-Tolerance of Nonmasking Programs
Sandeep S. Kulkarni and Ali Ebnenasir
Software Engineering and Network Systems Laboratory
Computer Science and Engineering DepartmentMichigan State University
Acknowledgement
This work is partially sponsored by: NSF, DARPA NEST, ONR URI, and Michigan State University
Motivation Programs are subject to unanticipated faults
Encounter new classes of faults, add corresponding fault-tolerance
How to add fault-tolerance? Develop from scratch (expensive approach) Incrementally add fault-tolerance
Reuse of the behaviors of the fault-intolerant program Potential to preserve properties that are hard to specify (e.g.,
efficiency)
How to ensure correctness? After the fact verification Automatic addition of fault-tolerance (correct by construction)
Motivation (Continued) Problem: Complexity of automatic addition
Automatic addition of fault-tolerance to distributed programs is
NP-hard [FTRTFT00], [ICDCS02]
How do we deal with this complexity? Develop heuristics Identifying the boundary of polynomial-time addition Step-wise addition (weaker forms of fault-tolerance)
The goal of this paper Enhance the fault-tolerance of nonmasking programs Partial automation of fault-tolerance programs
Outline
Preliminary Concepts
Enhancement Problem
Enhancement in High Atomicity Model
Enhancement for Distributed Programs
Example: Byzantine Agreement Program
Conclusion and Future Work
Preliminary Concepts:Programs and Faults
Finite State space Sp Invariant S, fault-span T Sp
Program p, Fault f, Safety { (s0, s1) | (s0, s1) Sp Sp }
Fault-tolerance Failsafe, Nonmasking, Masking
ST
p/f p
f
SpProgram
Fault
Step-Wise Addition
Intolerant Program
Nonmasking fault-tolerant
Masking fault-tolerantThis paper
[FTR
TFT
00
]
Failsafe fault-tolerant
[ICDCS02]
TSp
Enhancement Problem
Synthesis Algorithm
Nonmasking program p
Specification Spec
Invariant S
Masking program p'
Invariant S'
Faults f
Requirements: Only fault-tolerance is added; no new functional behavior is added
fS
Fault-span T'
S ' = T ' ST '
Enhancement in High Atomicity Model
Enhancement in High Atomicity Model
High Atomicity Model Each process can read/write all program variables
T S
ms
ms: States from where safety will be violated by fault transitions
f
Enhancement in High Atomicity Model – (Continued)
T S
• Deadlock States appear due to removing some transitions
ms
Find a state predicate T ' such that: T ' is closed in the computations of the program in the presence of faults The specification is satisfied from every state of T ' (i.e., no deadlocks)
Construct p' such that for every (s0, s1) p' : (s0, s1) does not violate safety s0 T ' s1 T '
T'
S'
Enhancement Addition
HighAtomicityEnhancement (p,f: transitions,
T:StatePredicate, specification spec) {
1. Calculate ms; Calculate mt;
2. T' = ConstructFaultSpan( );
3. if ( T' = {} ) declare no masking
f-tolerant program exists; exit;
else Construct the transitions of p';}
AddMasking (p,f: transitions, S:StatePredicate,
specification spec) {
1. Calculate ms; Calculate mt;
2. . . .
3. . . .
4. repeat4-1) . . .
4-2) . . .
4-3) T := ConstructFaultSpan( );4-4) . . .
4-5) if (S = {} \/ T = {}) declare no masking f-tolerant
program exists; exit;
until (ExitConditionHolds);
5. Remove cycles in outside the invariant in T ;
6. Construct the transitions of p'; }
Fault-intolerant program
Nonmasking program
Masking program
Manual Automatic: Enhancement
Partial Automation
[FTRTFT00]
Enhancement For Distributed Programs
Difficulties with Distribution Read/Write restrictions (low atomicity model). A program p
Two processes j, k Two Boolean variables a and b
Process j cannot read b Can we include the following transition?
a=0,b=0 a=1,b=0
Groups of transitions (instead of individual transitions) must be chosen.
a=0,b=1 a=1,b=1
Only if we include the transition
Enhancement of Nonmasking Distributed Programs
Calculate T' high
Calculate S' init = S' low
Calculate Sreachable from S' low by fault/program transitions
Calculate Srecovery from where recovery is possible to S' low
Srecovery = {}
Sreachable = {}
No
YesDeclarefailure
No
T' = S' low
Calculate p' transitions
Yes
Search in(T' high – S' low)
Under distribution restrictions
S' low = S' low Srecovery
Stop
Start
T
A High Atomicity Fault-Span
The largest possible domain for the states that can be included in the fault-span of the distributed program
S
T' high
S' high = S T' high
ms
The Initial Low Atomicity Invariant
Remove states from where an outgoing transition crosses the boundary of S' high
E.g., s0
Removal is a non-deterministic choice, where we have more than one state to remove
T' high
S' high
S0 S' init
T' high
Sreachable
S' low
Single-Step Reachable States Reachable by a fault/program transition (denoted Sreachable)
S' init
f
S1
S1
S0
S2 S3
S2 S3
T'high
Srecovery
Single-Step Recovery States Safer recovery in a single step (denoted Srecovery) Goal: infinite computations are possible from all states
in S' low
s0 represents a typical recovery state
S' init
S0
S2 S3
S2 S3
S' low
Enhancement of Nonmasking Distributed Programs
Calculate T' high
Calculate S' init = S' low
Calculate Sreachable from S' low by fault/program transitions
Calculate Srecovery from where recovery is possible to S' low
Srecovery = {}
Sreachable = {}
No
YesDeclarefailure
No
Start
Yes
S' low = S' low Srecovery
T' = S' low
Calculate p' transitions
Stop
Example: Byzantine Agreement Why this example?
Was used to illustrate the addition of masking fault-tolerance in [SRDS01]
Manual enhancement has been already applied [TSE98] Processes: General, g, and three non-generals j, k, and l Variables
d.g : {0, 1} d.j, d.k, d.l : {0, 1, ┴ } b.g, b.j, b.k, b.l : {0, 1} f.j, f.k, f.l : {0, 1}
Safety Specification: Agreement: No two non-Byzantine non-generals can
finalize with different decisions Validity: If g is not Byzantine, no process can finalize with
different decision with respect to g A finalized process should not execute any transition
g
lkj
Example: Byzantine Agreement Read/Write restrictions
Readable variables for process j b.j, d.j, f.j, d.g, d.k, d.l
Process j can write d.j, f.j Disjkstra’s guarded commands
Guard Statement { (s0, s1) | Guard holds at s0 and atomic execution of Statement yields s1 }
Nonmasking fault-tolerant program transitions
d.j = ┴ f.j = 0 d.j := d.g d.j ≠ ┴ f.j = 0 f.j := 1 d.j = 1 d.k = 0 d.l = 0 d.j := 0 d.j = 0 d.k = 1 d.l = 1 d.j := 1
Fault transitions ¬b.g ¬b.j ¬b.k ¬b.l b.j := true b.j d.j :=0|1
Example: Byzantine Agreement (Continued)
d.j = d.k = ┴ , d.g = 1, d.l = 1, f.l = 0
d.j = d.k = ┴ , d.g = 1, d.l = 1, f.l = 1
S0
S1
A good transition inside the invariant
d.j = d.k = 0 , d.g = 0, d.l = 1, f.l = 1 S4
Fault transition
A deadlock state
Premature finalization
b.g = 1
d.j = d.k = ┴ , d.g = 0, d.l = 1, f.l = 1
d.j = d.k = ┴ , d.g = 0, d.l = 1, f.l = 1
S3
S2
Why enhancement is easier?
Example: Byzantine Agreement (Continued)
d.j = ┴ f.j = 0 d.j := d.gd.j ≠ ┴ f.j = 0 f.j := 1d.j = 1 d.k = 0 d.l = 0 d.j := 0d.j = 0 d.k = 1 d.l = 1 d.j := 1
((d.j = d.k) (d.j = d.l))
(f.j = 0)
(f.j = 0)
Masking fault-tolerant program
High atomicity reasoning Synthesize a masking program in high atomicity and
then refine it to a distributed program
Enhancement vs. Addition
Reuse the computations of the nonmasking program
Reasoning in high atomicity model has the potential to reduce the complexity of addition
Synthesis Framework Development of a synthesis framework
Developers of fault-tolerance can interactively add fault-tolerance to fault-intolerant programs
Partial automation helps us to reap the benefits of automation as much as possible
Enhancement identifies programs where partial automation is possible
Implementation of enhancement algorithms in the synthesis framework
http://www.cse.msu.edu/~sandeep/software/Code/synthesis-framework/
Conclusion and Future Work Enhancement simplifies automated design of masking
programs Less asymptotic complexity
Polynomial-time enhancement in the low atomicity model (in the state space of the nonmasking program)
Sound, but not complete
Reasoning in high atomicity simplifies the synthesis of masking distributed programs
Future Work: A polynomial-time sound and complete enhancement
algorithm for a restricted class of programs and specifications
Thank You!
Questions?
Example: Triple Modular Redundancy
Processes: Three processes: j, k, and l Variables and their domains
in.j, in.k, and in.l are Boolean variables out belongs to { 0, 1, ┴ }
Nonmasking program (+ addition in modulo 3):
N1: (out = ┴) out := in.jN2: (out != ┴) /\ (out != in.j) /\
((in.j = in.k) \/ (in.j = in.l)) out := in.j Faults:
F: (in.j = in.k) /\ (in.j = in.l) in.j := 0|1 Safety specification:
Do not reach states where out is different than the majority of inputs.
out should not be changed after it is assigned a value.
Example: Triple Modular Redundancy
Invariant: S = ((out = ┴) /\ (in.j = in.k = in.k)) \/ (out = in.j = in.k)
\/ (out = in.j = in.l) \/ (out = in.k = in.l) Fault-span:
T = ( (in.j = in.k = in.l) => ((out = ┴) \/ (out = in.j = in.k = in.l)) ) Enhancement algorithm:
Compute ms: ms = { } Remove bad transitions:
{t: t violates safety} and {t: t reaches ms}
Construct a new fault-span T’:
T’ = T – { s: (out !=┴) /\ (out is not equal to majority of inputs) } Masking program:
M1: (out = ┴) /\ (in.j = in.k) \/ (in.j = in.l) out := in.j
Enhancement of Nonmasking Distributed Programs
Calculate T' high
Calculate S' init = S' low
Calculate Sreachable from S' low by fault/program transitions
Calculate Srecovery from where recovery is possible to S' low
Srecovery = {}
Sreachable = {}
No
YesDeclarefailure
No
Start
T' = S' low , calculate p' transitionsYes
S' low = S' low Srecovery
Enhancement of Nonmasking Distributed Programs
Calculate T' high
Calculate S' init = S' low
Calculate Sreachable from S' low by fault/program transitions
Calculate Srecovery from where recovery is possible to S' low
Srecovery = {}
Sreachable = {}
No
YesDeclarefailure
No
Start
T' = S' low , calculate p' transitionsYes
S' low = S' low Srecovery
Enhancement of Nonmasking Distributed Programs
Calculate T' high
Calculate S' init = S' low
Calculate Sreachable from S' low by fault/program transitions
Calculate Srecovery from where recovery is possible to S' low
Srecovery = {}
Sreachable = {}
No
YesDeclarefailure
No
Start
T' = S' low , calculate p' transitionsYes
S' low = S' low Srecovery
S' init = S' low
at the first iteration
Enhancement of Nonmasking Distributed Programs
Calculate T' high
Calculate S' init = S' low
Calculate Sreachable from S' low by fault/program transitions
Calculate Srecovery from where recovery is possible to S' low
Srecovery = {}
Sreachable = {}
No
YesDeclarefailure
No
Start
T' = S' low , calculate p' transitionsYes
S' low = S' low Srecovery
top related