fault tolerant computing based on diversity

26
18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ 2005701688

Upload: toshi

Post on 14-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Fault Tolerant Computing Based on Diversity. by Seda Demirağ 2005701688. INTRODUCTION. The software faults in a real-time system: Concurency-control faults: These fault involve inter-process communication and syncronization, data coherence and protection, adn deadlock. Timing: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fault Tolerant Computing Based on Diversity

18/05/2006

Fault Tolerant Computing Based on Diversity

by Seda Demirağ2005701688

Page 2: Fault Tolerant Computing Based on Diversity

18/05/2006

INTRODUCTION

The software faults in a real-time system:

Concurency-control faults: These fault involve inter-process communication and

syncronization, data coherence and protection, adn deadlock.

Timing: a task is not completed in the specified amount of time

Error-detection and error-recovery: These faults occur when the detection and recovery

mechanism could not handle an error or invoked when no error exists.

Page 3: Fault Tolerant Computing Based on Diversity

18/05/2006

INTRODUCTION

Software fault tolerance is techniques:

are designed to allow a system to tolerate software faults that remain in the system after its development

provide mechanisms to the software system to prevent system failure from occurring

have been used mostly in the aerospace, nuclear power, healthcare, telecommunications and ground transportation industries whose faults can be catastrophic.

In this term paper, I will discuss the fault tolerance techniques based on design and data diversity.

Page 4: Fault Tolerant Computing Based on Diversity

18/05/2006

SOFTWARE FAULT TOLERANT TECHNIQUES: DATA and DESIGN DIVERSITY

Multiple data representation enviroment:

Data diverse techniques are used in a multiple data representation environment

utilize different representations of input data to provide tolerance to software design faults

Multiple version software enviroment:

Design diverse techniques are used in a multiple version software environment

use the functionally of independently developed software versions to provide tolerance to software design faults

Page 5: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques

Two or more variants of software developed by different teams but to a common specification are used.

These variants are then used in a time or space redundant manner to achieve fault tolerance.

Disadvantages of design diversity is the high cost involved in developing multiple variants of software

Page 6: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques

Popular techniques which are based on the design diversity concept for fault tolerance in software are:

Recovery Block

N-Version Programming

N-Self-Checking Programming

Page 7: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: Recovery Block (RcB)

It was introduced in 1974 by Horning, with early implementations developed by Randell in 1975 and Hecht in 1981

Its selection is made during program execution based on the result of the acceptance test (AT)

The basic RcB scheme consists of an executive, an acceptance test, and primary and alternate try blocks (variants)

Many implementations of RcB, especially for real-time applications, include a watchdog timer

The RcB is categorized as a dynamic technique

Page 8: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: Recovery Block (RcB)

This figure illustrates the structure and operation of the basic RcB technique with a watchdog timer.

The RcB figure states that the technique will first attempt to ensure the AT by using the primary alternate

If the primary algorithm’sresult does not pass theAT, then n-1 alternates will be attempted until an alternate’s results pass the AT. If no alternates are successful, an erroroccurs.

Page 9: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: N-Version Programming (NVP)

NVP was suggested by Elmendorf in 1972 and developed by Avizienis and Chen in 1977–1978

Compared with RcB, NVP is s a static technique. That means a task:

is executed by several processes or programs and a result is accepted only if it is adjudicated as an acceptable result, usually via a majority vote.

Page 10: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: N-Version Programming (NVP)

This figure illustrates the structure and operation of the basic NVP technique

The NVP technique usesa decision mechanism (DM)and forward recovery to accomplish fault tolerance.

The technique uses at least two independently designed, functionally equivalent versions (variants) of a program developed from the same specification.

The variants are run in parallel and a DM examines the results and selects the “best” result, if one exists

Page 11: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: N-Version Programming (NVP)

General syntax:run Version 1, Version 2, ..., Version nif (Decision Mechanism (Result1, Result2,...,Result n))

return Resultelse failure exception

The NVP syntax above states that the technique executes the n versions concurrently. The results of these executions are provided to the DM, which operates upon them to determine if a correct result can be adjudicated. If one can, then it is returned. If a correct result cannot be determined, then an error occurs.

Page 12: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: N Self-Checking Programming (NSCP)

NSCP is a design diverse technique developed by Laprie.

The hardware fault tolerance architecture related to NSCP is active dynamic redundancy.

It results from either the application of an AT to a variant’s results or from the application of a comparator to the results of two variants.

Page 13: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: N Self-Checking Programming (NSCP)

This figure illustrates the structure and operation of the basic NSCP technique

Page 14: Fault Tolerant Computing Based on Diversity

18/05/2006

Design Diversity Techniques: N Self-Checking Programming (NSCP)

General syntax:run Variants 1 and 2 on Hardware Pair 1,Variants 3 and 4 on Hardware Pair 2compare Results 1 and 2 compare Results 3 and 4

if not (match) if not (match)set NoMatch1 set NoMatch2

else set Result Pair 1 else set Result Pair 2if NoMatch1 and not NoMatch2, Result = Result Pair 2else if NoMatch2 and not NoMatch1, Result = Result Pair 1else if NoMatch1 and NoMatch2, raise exceptionelse if not NoMatch1 and not NoMatch2then compare Result Pair 1 and 2

if not (match), raise exceptionif (match), Result = Result Pair 1 or 2

return Result

The NSCP syntax above states that the technique executes the n variants concurrently, on n/2 hardware pairs. The results of the paired variants are compared. If any pair’s results do not match, a flag is set indicating pair failure. If a single pair failure has occurred, then the nonfailing pair’s results are returned as the NSCP result. If both pairs failed to match, then an exception is raised. If pair results match then the results of the pairs are compared. If they match, then the result is set as one of the matching values and returned as the NSCP result. If the result of the pair matches does not match, then an exception is raised.

Page 15: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques

Data diversity, a technique for fault tolerance in software, was introduced by Amman and Knight.

While the design diversity approaches to provide fault tolerance rely on multiple versions of the software written to the same specifications, the data diversity approach uses only one version of the software.

This approach relies on the observation that a software sometime fails for certain values in the input space and

this failure could be averted if there is a minor perturbation of input data which is acceptable to the software.

Page 16: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques

This technique is cheaper to implement than the design diversity tecghnique.

Popular techniques which are based on the data diversity concept for fault tolerance in software are:

Retry Blocks

N-Copy Programming

Page 17: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques: Retry Blocks (RtB)

A retry block is a modification of the recovery block structure that uses data diversity instead of design diversity.

Rather than the multiple alternate algorithms used in a recovery block, a retry block use only one algorithm.

A retry block's acceptance test has the same form and purpose as a recovery block's acceptance test.

Page 18: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques: Retry Blocks (RtB)

This figure illustrates the structure and operation of the basic RtB technique

A retry block executes the single algorithm normally and evaluates the acceptance test. If the acceptance test passes, the retry block is complete.

If the acceptance test fails, the algorithm executes again after the data have been reexpressed. The system repeats this process until it violates a deadline or produces a satisfactory output.

Page 19: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques: Retry Blocks (RtB)

General syntax:ensure Acceptance Testby Primary Algorithm (Original Input)else by Primary Algorithm (Re-expressed Input)else by Primary Algorithm (Re-expressed Input)...... [Deadline Expires]else by Backup Algorithm (Original Input)else failure exception

The RtB syntax above states that the technique will first attempt to ensure the AT by using the primary algorithm. If the primary algorithm’s result does not pass the AT, then the input data will be reexpressedand the same algorithm attempted until a result passes the AT or the WDT deadline expires. If the deadline expire, the backup algorithm is invoked with the original inputs. If this backup algorithm is not successful, an error occurs.

Page 20: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques: N-Copy Programming (NCP)

An N-copy system is similar to an N-version system but uses data diversity instead of design diversity.

N copies of a program execute in parallel, each on a set of data produced by reexpression.

The system selects the output to be used by an enhanced voting scheme.

Page 21: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques: N-Copy Programming (NCP)

This figure illustrates the structure and operation of the basic NCP technique

The NCP technique uses a decision mechanism (DM) and forward recovery to accomplish fault tolerance.

The technique uses one or more Data re-expression algorithms(DRAs) and at least two copies of a program.

The system inputs are run through the DRA(s) to re-express the inputs.

The copies execute in parallel using the re-expressed data as input.

A DM examines the results of the copy executions and selects the “best” result, if one exists.

Page 22: Fault Tolerant Computing Based on Diversity

18/05/2006

Data Diversity Techniques: N-Copy Programming (NCP)

The basic NCP technique consists of an executive, 1 to n DRA, n copies of the program or function, and a DM. The executive orchestrates the NCP technique operation, which has the general syntax:

run DRA 1, DRA 2, ..., DRA nrun Copy 1(result of DRA 1),

Copy 2(result of DRA 2), ...,Copy n(result of DRA n)

if (Decision Mechanism (Result 1, Result 2, ...,Result n))return Result

else failure exception

The NCP syntax above states that the technique first runs the DRA concurrently to re-express the input data, then executes the n copies concurrently.

The results of the copy executions are provided to the DM, which operates upon the results to determine if a correct result can be adjudicated.

If one can (i.e., the Decision Mechanism statement above evaluates to TRUE), then it is returned. If a correct result cannot be determined, then an error occurs.

Page 23: Fault Tolerant Computing Based on Diversity

18/05/2006

Enviroment Diversity Techniques

Environment diversity is the newest approach to fault tolerance in software.

The environment diversity approach requires reexecuting the software in a different environment.

Transient faults typically occur in computer systems due to design faults in software which result in unacceptable and erroneous states in the OS environment.

When the software fails, it is restarted in a different, error-free OS environment state which is achieved by some clean up operations.

Page 24: Fault Tolerant Computing Based on Diversity

18/05/2006

CONCLUSION

A lot of techniques have been developed for achieving fault tolerance in software.

The application of all of these techniques is relatively new to the area of fault tolerance.

Furthermore, each technique will need to be tailored to particular applications.

This should also be based on the cost of the fault tolerance effort required by the customer.

The differences between each technique provide some flexibility of application.

Page 25: Fault Tolerant Computing Based on Diversity

18/05/2006

REFERENCES

[1] “Data Diversity: An Approach to Software Fault Tolerance”, R. E. Ammann and J. C. Knight, IEEE Transactions on Computers, April 1988 (Vol. 37, No. 4) pp. 418-425.

[2] “Software Fault Tolerance”; Chris Inacio, Carnegie Mellon University 18-849b Depandable Embedded Systems, Spring 1998.

[3] “Design Diversity: an Update from Research on Reliability Modelling”; Peter Popov, Bev Littlewood, Lorenzo Strigini; Safety Critical Symposium 2001(Springer 2001)

[4] “Modelling software design diversity: a review”; Littlewood, B., Popov, P., and Strigini, L. (2001); ACM Computing Surveys, 33(2):177—208

[5] “A Survey of Software Fault Tolerance Techniques”; Zaipeng Xie, Hongyu Sun, Kewal Saluja.

Page 26: Fault Tolerant Computing Based on Diversity

18/05/2006

Thank You!!

Any Questions?