![Page 1: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/1.jpg)
Fault Tolerance Mechanisms
ITV Model-based Analysis and Design of Embedded SoftwareTechniques and methods for Critical Software
Anders P. RavnAalborg University
August 2011
![Page 2: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/2.jpg)
Fault Tolerance
Means to isolate component faults
Prevents system failures
May increase system dependability
... And mask them
![Page 3: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/3.jpg)
Fault Tolerance
![Page 4: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/4.jpg)
FT - levels
• Full tolerance
• Graceful Degradation
• Fail safeBW p. 107
![Page 5: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/5.jpg)
FT basis: Redundancy
• Time
• Space
Try Retry Retry ...
TryTry
Try
...
![Page 6: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/6.jpg)
Fault Tolerance
![Page 7: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/7.jpg)
Basic Strategies
![Page 8: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/8.jpg)
Dynamic Redundancy
1. Error detection
2. Damage confinement and assessment
3. Error recovery
4. Fault treatment and continued service
BW p. 114
![Page 9: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/9.jpg)
Error Detection
f: State x Input State x Output
• Environment (exception)• Application Assertion:
• precondition (input)• postcondition (input, output)• invariant(state, state’)
Timing:• WCET(f, input) • Deadline (f,input)
D
![Page 10: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/10.jpg)
Damage Confinement
• Static structure
• Dynamic structure (transaction)
object
object
II
![Page 11: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/11.jpg)
Error Recovery
• Forward • Backward
Repair the state – if you can !
• define recovery points• checkpoint state at r. p.• roll back• retry
Domino effect
![Page 12: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/12.jpg)
Recovery blocks
ENSURE acceptance_testBY { module_1 }ELSE BY { module_2 } ...ELSE BY { module_m }ELSE ERROR
BW p. 120
![Page 13: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/13.jpg)
Implementation of Recovery Blocks
![Page 14: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/14.jpg)
Abstract class RecoveryBlockpublic abstract class RecoveryBlock {
abstract boolean acceptanceTest();
/** method to produce the result, it must be implemented by the application.
* @param module 0, ... , MaxModule-1 */
abstract void block(int module);
/* MaxModules must be set by the application to the number of blocks */
protected int MaxModules;
ENSURE acceptance_testBY { module_1 }ELSE BY { module_2 } ...ELSE BY { module_m }ELSE ERROR
![Page 15: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/15.jpg)
RecoveryBlock execution/** method to execute recovery module 0, 1, ... MaxModules-1 until one succeds
* @throws NoAccept if no module passes acceptanceTest.
*/
public final void do_it() throws NoAccept, CloneNotSupportedException{
save();
int i = 0;
do { try { block(i++);
if ( acceptanceTest() ) return;
} catch (Exception e) {/* if the block fails, we continue - not acceptance */}
restore(copy);
} while (i < MaxBlocks);
throw new NoAccept();
}
}
ENSURE acceptance_testBY { module_1 }ELSE BY { module_2 } ...ELSE BY { module_m }ELSE ERROR
![Page 16: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/16.jpg)
RecoveryBlock cachepublic abstract class RecoveryBlock {
/** The recovery Cache is implemented by a clone of the original object */
RecoveryBlock copy;
/** save object to recovery cache, uses Java clone which must be a deep clone. */
private final void save() throws CloneNotSupportedException {
copy = (RecoveryBlock) this.clone();
}
/** method to restore data from recovery cache, it must be implemented by the application
* @param value of the object to be restored */
abstract void restore(RecoveryBlock copy);
![Page 17: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/17.jpg)
Application/** Extends the basic abstract RecoveryBlock with faulty sorting
* algorithms and log calls, returns etc. to a TextArea. */
public class RecoveringSort extends RecoveryBlock {
/** checksum for acceptance test */
private int checksum;
/** data to be saved in recovery cache */
private int [] argument;
public RecoveringSort(TextArea t) {
MaxBlocks = 3;
log = t;
}
![Page 18: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/18.jpg)
Acceptance criteria /* Acceptance test for sorting; it shall verify:
* 1) the return value is an ordered list,
* 2) the return value is a permutation of the initial values */
boolean acceptanceTest() {
boolean result = true;
// check ordering
int i = argument.length-1;
while (i > 0) if (argument[i] < argument[--i]) {result = false; break; }
// check permutation, this is a partial check through a checksum
// A full check is as expensive computationally as sorting,
// thus, we use a partial check.
i = argument.length; int sum = 0;
while (i > 0) sum+=argument[--i];
return result && (sum == checksum);
}
![Page 19: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/19.jpg)
Application - modules /** Starts sorting using the recovery block mechanisms..
* @param data integer array containing elements to be sorted. */
public int [] sort(int [] data) {
argument = (int [])data.clone(); // copy needed for recovery to work
checksum = 0; int i = argument.length; while (i > 0) checksum+=argument[--i];
try { do_it();
} catch (NoAccept e) { log.append("All blocks falied\n"); }
return argument;
}
void block(int i) {
switch (i) {
case 0: BucketSort(argument); break;
case 1: BadSort(argument); break;
case 2: AlmostGoodSort(argument); break;
default:
}
}
![Page 20: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/20.jpg)
Fault classes (scope of R-B)
• Origin
• Kind
• Property
• physical (internal/external)
• logical (design/interaction)
• omission
• value
• timing
byzantine
• duration (permanent, transient)
• consistency (determinate, nondeterminate)
• autonomy (spontaneous, event-dependent)
++
(+)++(-)
+ / (+)
+ / ++ / +
![Page 21: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/21.jpg)
The ideal FT-component
Exception HandlerNormal mode
Request/response
Request/response
Interfaceexception
Interfaceexception
Failureexception
Failureexception
![Page 22: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/22.jpg)
N-version programming
V1 V2 V3
Driver (comparator)
Comparison vectors (votes)
Comparison status indicators
Comparison points
![Page 23: Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1f5503460f94c37866/html5/thumbnails/23.jpg)
Fault classes (scope of N-VP)
• Origin
• Kind
• Property
• physical (internal/external)
• logical (design/interaction)
• omission
• value
• timing
byzantine
• duration (permanent, transient)
• consistency (determinate, nondeterminate)
• autonomy (spontaneous, event-dependent)
++
(+)+++
+ / (+)
+ / ++ / +