synthesis of fault-tolerant distributed programs

Synthesis of Fault-Tolerant Distributed Programs

Ali Ebnenasir

Department of Computer Science and EngineeringMichigan State University

East Lansing MI 48824 [email protected]

Advisor: Dr. Sandeep S. Kulkarni

2

Motivation Programs are subject to unanticipated faults

New classes of faults, add corresponding fault-tolerance

How to add fault-tolerance? Design a fault-tolerant program from scratch Incremental addition of fault-tolerance

How to ensure correctness? Verification after the fact Automatic synthesis of fault-tolerant programs

(correct by construction)

3

Motivation (Continued) Synthesis of fault-tolerant programs

Start from (Temporal Logic) specification Start from the fault-intolerant program

Synthesis of fault-tolerant programs from their fault-intolerant versions has the potential to

Reuse the behaviors of the fault-intolerant program Preserve behaviors that are hard to specify (e.g.,

efficiency)

Problem: Complexity of synthesis A polynomial-time non-deterministic algorithm for the

synthesis of fault-tolerant distributed programs [FTRTFT00]

4

Outline

Program and Fault Model

Distribution Model

Problem Statement

Strategy

Current Results

Future Plan

5

Program and Fault Model Program is identified by its state space and set of

transitions Finite State space Sp Invariant S, fault-span T Sp

Program p, Fault f, Safety { (s0, s1) | (s0, s1) Sp Sp }

Fault-tolerance Satisfy a particular fault-tolerance specification in the presence of

faults Failsafe, Nonmasking, MaskingST

p/f p

f

Sp

6

Distribution Model Read/Write restrictions Example

A program p with two processes j and k Two Boolean variables a and b Process j cannot read b Can we include the following transition?

a=0,b=0 a=1,b=0

Groups of transitions (instead of individual transitions) must be chosen

a=0,b=1 a=1,b=1

Only if we include the transition

7

Problem Statement

Synthesis Algorithm

Fault-intolerant program p

Specification Spec

Invariant S

Fault-tolerant program p'

Invariant S'Faults f

No new transition here New transitions added here

S S'p

Finite state space

Distribution restrictions

Sp f

8

Strategy

Theoretical issues Develop heuristics Explore polynomial-time boundaries Analyze fault-intolerant programs

Develop a synthesis framework for Developers of fault-tolerance Developers of heuristics

9

Theoretical Issues - Heuristics

Apply heuristics to reduce the exponential complexity [SRDS01]

Assign weights to transitions and states based on their usefulness

Different approaches for resolving deadlocks and livelocks

Identify the applicability of heuristics to the problem at hand

Choose different subsets of heuristics Apply in different order

10

Theoretical Issues – Polynomial-Time Boundary

Find properties of programs/specifications where polynomial-time synthesis is possible

Example: Algorithmic synthesis of failsafe fault-tolerant

programs is NP-hard [ICDCS02]

Polynomial-time synthesis of failsafe fault-tolerance for monotonic programs and specification

11

Example for Polynomial-Time Boundary:

Monotonicity of SpecificationsDefinition: A specification spec is positive monotonic with respect to

variable x iff: For every s0, s1, s’0, s’1:

The value of all other variables in s0 and s’0 are the same. The value of all other variables in s1 and s’1 are the same.

s1s0

x = falsex = false

If

Does not violate safety

s’0 s’1

x = truex = true

Does not violate safety

Then

12

Example for Polynomial-Time Boundary:

Monotonicity of ProgramsDefinition: Program p with invariant S is negative monotonic with respect to

variable x iff: For every s0, s1, s’0, s’1:

The value of all other variables in s0 and s’0 are the same. The value of all other variables in s1 and s’1 are the same.

Invariant S

s1s0

x = truex = true

s’0 s’1

x = falsex = false

13

Example for Polynomial-Time Boundary: Theorem

Synthesis of failsafe fault-tolerance can be done in polynomial time if either:

Program is negative monotonic, and Spec is positive monotonic;

Or Program is positive monotonic, and Spec is negative monotonic.

If only one of these conditions is satisfied then synthesizing failsafe fault-tolerance is still NP-hard.

For many problems, these requirements are easily met. E.g., Agreement, Consensus, and Commit.

14

Example for Polynomial-Time Boundary: Byzantine Agreement

Processes: General, g, and three non-generals j, k, and l Variables

d.g : {0, 1} d.j, d.k, d.l : {0, 1, ┴ } b.g, b.j, b.k, b.l : {true, false} f.j, f.k, f.l : {0, 1}

Fault-intolerant program transitions d.j = ┴ /\ f.j = 0 d.j := d.g d.j ≠ ┴ /\ f.j = 0 f.j := 1

Fault transitions ¬b.g /\ ¬b.j /\ ¬b.k /\ ¬b.l b.j := true b.j d.j :=0|1

g

lkj

15


(Continued) Safety Specification

Agreement: No two non-Byzantine non-generals can finalize with different decisions

Validity: If g is not Byzantine, each non-Byzantine non-general process should finalize with the same decision as g

Read/Write restrictions Readable variables for process j:

b.j, d.j, f.j d.g, d.k, d.l

Process j can write d.j, f.j

16


(Continued)

Observation 1: Positive monotonicity of specification with respect to b.j

Observation 2: Negative monotonicity of program, consisting of the

transitions of j, with respect to b.k Observation 3:

Negative monotonicity of specification with respect to f.j

Observation 4: Positive monotonicity of program, consisting of the

transitions of j, with respect to f.k

17


(Continued)

Failsafe fault-tolerant program.

d.j = ┴ /\ f.j = 0 d.j := d.g d.j ≠ ┴ /\ ((d.j = d.k) \/ (d.j = d.l)) /\ f.j = 0 f.j := 1

18

Theoretical Issues – Analysis of Fault-Intolerant

Programs

Analyze the behavior and the structure of the fault-intolerant program.

Example: Reasoning about the program in high atomicity; i.e.,

no distribution restrictions. Enhancement of fault-tolerance [ICDCS03].

Take advantage of model checkers.

19

Theoretical Issues – Analysis of Fault-Intolerant

Programs

SynthesisFramework

The SPIN Model Checker

Fault-tolerant program

Intermediate program in Promela

Fault-intolerant program

Counterexample

20

Theoretical Issues: Current Results

Intolerant Program

Masking fault-tolerant

[FTR

TFT

00

]

Failsafe fault-tolerant

[ICDCS02]

Nonmasking fault-tolerant

[ICDCS03]

21

Synthesis Framework Goals:

Algorithmic synthesis of fault-tolerant programs from their fault-intolerant versions.

Easy to integrate new heuristics. Easy to change its implementation.

Users: Developers of fault-tolerance. Developers of heuristics.

Examples: A canonical version of Byzantine agreement. An agreement program that is subject to Byzantine and

failstop faults (1.3 million states). A token ring program perturbed by state-corruption faults.

22

Related Work E.A. Emerson and E.M. Clarke, Using branching time temporal

logic to synthesize synchronization skeletons, 1982.

Z. Manna and P. Wolper, Synthesis of communicating processes from temporal logic specifications, 1984.

A. Arora, P.C. Attie, and E.A. Emerson, Synthesis of fault-tolerant concurrent programs, 1998.

P.C. Attie, and E.A. Emerson, Synthesis of concurrent programs for an atomic read/write model of computation, 1996.

O. Kupferman and M. Vardi, Synthesis with incomplete information, 1997.

23

Future Plan

Theoretical issues Develop more intelligent heuristics to reduce the

chance of failure in the synthesis Find polynomial-time boundary for other levels of

fault-tolerance

Synthesis framework issues Scalability of the synthesis framework for larger

programs Implement the synthesis algorithm on a distributed platform

24

Future Plan - Continued Synthesis framework issues

Use model checkers for behavioral analysis Query

Intermediate program Reachability analysis from a given state

Result set Deadlock states Non-progress cycles Finite sequence of states

25

Publications [ICDCS02] Sandeep S. Kulkarni and Ali Ebnenasir. The Complexity

of Adding Failsafe Fault-Tolerance. The 22nd International Conference on Distributed Computing Systems, July 2-5, 2002 - Vienna, Austria.

[ICDCS03] Sandeep S. Kulkarni and Ali Ebnenasir. Enhancing The Fault-Tolerance of Nonmasking Programs. Accepted in the 23rd International Conference on Distributed Computing Systems, May 19-22, 2003 - Providence, Rhode Island USA.

[SRDS03] Sandeep S. Kulkarni and Ali Ebnenasir. A Framework for Automatic Synthesis of Fault-Tolerance. Submitted to The 22nd Symposium on Reliable Distributed Systems 6th-8th/October, 2003 - Florence, Italy.

The implementation of the synthesis framework: http://www.cse.msu.edu/~sandeep/software/Code/synthesis-framework/

26

Thank You!

Questions and Comments?

27

Reduction from 3-SAT

Included iff x0 is false

Included iff x0 is true

Included iffxj is false

Included iffxk is true

Included iffxl is false

cj = xj \/ xk \/ xl

_

an = a0a0

x0 x1

x’0 x’1x’n

xn

synthesis of fault-tolerant distributed programs

Documents