juan pardo fault tolerant systems group polytechnic university of valencia spain
DESCRIPTION
Reliability study of an embedded operating system for industrial applications Pardo, J., Campelo, J.C, Serrano, J.J. Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain. Research Objectives. - PowerPoint PPT PresentationTRANSCRIPT
11
Reliability study of an embedded Reliability study of an embedded operating system for industrial operating system for industrial
applicationsapplications
Pardo, J., Campelo, J.C, Serrano, J.J.Pardo, J., Campelo, J.C, Serrano, J.J.
Juan PardoJuan PardoFault Tolerant Systems GroupFault Tolerant Systems Group
Polytechnic University of Valencia Polytechnic University of Valencia Spain Spain
SEPT’04SEPT’04 WSRS '04 WSRS '04 22
Research ObjectivesResearch Objectives Critical industrial applications or fault tolerant Critical industrial applications or fault tolerant
applications need for applications need for operating systems (OS) which which guarantee a correct and safe behaviour despite the guarantee a correct and safe behaviour despite the appearance of errors. appearance of errors.
In order to validate the behaviour of an operating system In order to validate the behaviour of an operating system in front of errors, in front of errors, software fault injection techniques can can be used. be used.
These techniques can be used These techniques can be used to corrupt the information of some of the operating system calls to see of some of the operating system calls to see how the system react in front of invalid or corrupted how the system react in front of invalid or corrupted values at the kernel calls. values at the kernel calls.
SEPT’04SEPT’04 WSRS '04 WSRS '04 33
Research ObjectivesResearch Objectives The research work presented is about the development and results The research work presented is about the development and results
on on software fault injection in an embedded system composed by a in an embedded system composed by a Real-Time Operating System (RTOS) and a microcontroller.Real-Time Operating System (RTOS) and a microcontroller.
A A software fault injection tool has been developed. The has been developed. The methodology proposed treated the operating system as a methodology proposed treated the operating system as a black-box where its source code was not available.where its source code was not available.
With this objective a With this objective a layer between the operating system and the between the operating system and the application to be executed has been developed. application to be executed has been developed.
OS OS error detection coverage has been measured and observations has been measured and observations about OS about OS critical data structures to be improved have been to be improved have been commented, in order to improve the final commented, in order to improve the final robustness of the of the operating system.operating system.
SEPT’04SEPT’04 WSRS '04 WSRS '04 44
IntroductionIntroduction Software of computer systems involves a lot of aspects of our lives. Software of computer systems involves a lot of aspects of our lives.
Despite their enormous expansion, they are still far from reaching the Despite their enormous expansion, they are still far from reaching the perfection.perfection.
In order to measure the quality of the software some tests are required. In order to measure the quality of the software some tests are required.
Fault tolerance deals with software’s ability to hide problems, deals with software’s ability to hide problems, specifically the effects of faults [specifically the effects of faults [Voas98]. ].
Robustness is the degree to which a system operates correctly in the is the degree to which a system operates correctly in the presence of exceptional inputs or stressful environmental conditions. presence of exceptional inputs or stressful environmental conditions.
Robustness can thus be viewed as an indication on the OS capacity to can thus be viewed as an indication on the OS capacity to resist/react to faults induced by the applications running on top of it, or resist/react to faults induced by the applications running on top of it, or originating from the hardware layer or from device drivers [originating from the hardware layer or from device drivers [DBench02].].
SEPT’04SEPT’04 WSRS '04 WSRS '04 55
IntroductionIntroduction
Fault Tolerant SystemFault Tolerant System
Fault tolerance is intended to preserve the delivery of correct is intended to preserve the delivery of correct service in the presence of active faults. It is generally implemented service in the presence of active faults. It is generally implemented by error detection and subsequent system recoveryby error detection and subsequent system recovery
A system able to A system able to continue working although the appearance of although the appearance of errorserrors
Safe behaviour known state which doesn’t produce any risk to known state which doesn’t produce any risk to the systemthe system
DependabilityDependability
To avoid the lost of To avoid the lost of human lives or important or important economic quantities Final products quality Final products quality Validation before to go to the market before to go to the market
SEPT’04SEPT’04 WSRS '04 WSRS '04 66
IntroductionIntroduction
DependabilitDependabilityy
AttributesAttributes MeansMeans ThreatsThreats
AvailabilityReliabilitySafetyConfidentialityIntegrityMaintainability
Fault preventionFault toleranceFault removalFault forecasting
FaultsErrorsFailures
Dependability:Dependability:
Dependability of a computing system is the ability to deliverservice that can justifiably be trusted
A. AvizienisJC. LaprieB. Randell
SEPT’04SEPT’04 WSRS '04 WSRS '04 77
State of artState of artFault InjectionFault Injection
TechniquesTechniques Fault InjectionFault Injection
FI on Simulated models FI on prototypes
VHDL Simulation
models
Other languages
Hardware Injection HWIFI
Software Injection SWIFI
External
Internal
HWIFI at pin level
Electromagnetic Perturbations
Time Level
Static
Dynamic
High Level
Machine Language
Heavy ion radiations
Laser Radiation
Scan Chain
Injection Objectives:
•Prediction
•Elimination
SEPT’04SEPT’04 WSRS '04 WSRS '04 88
Advantages & drawbacks (SWIFI )Advantages & drawbacks (SWIFI )
Total control on When and Where to inject Total control on When and Where to inject ControllabilityControllability
Higher level faults simulationHigher level faults simulation
Reduced costReduced cost
Higher Higher reachabilityreachability
Higher portability Higher portability FlexibilityFlexibility
Low risk to damage the circuit under testsLow risk to damage the circuit under tests
Easy Easy automationautomation of the injection campaigns of the injection campaigns
Good Good observabilityobservability everyday processors have more internal tools for everyday processors have more internal tools for
debuggingdebugging
SEPT’04SEPT’04 WSRS '04 WSRS '04 99
Advantages & drawbacks (SWIFI )Advantages & drawbacks (SWIFI )
There are zones which SW can not reach.There are zones which SW can not reach.
Less precision on timing measurements Less precision on timing measurements interferences with the interferences with the
system, overload, etc. system, overload, etc.
Injection and activation agents overload the systemInjection and activation agents overload the system
Runtime Injection Runtime Injection Little intrusion Little intrusion
Objective: minimize the overloadObjective: minimize the overload
Drawback for RTOSDrawback for RTOS
Easy automation of injections campaignsEasy automation of injections campaigns
Pre-runtime Pre-runtime Less intrusion Less intrusion
SEPT’04SEPT’04 WSRS '04 WSRS '04 1010
SW Fault InjectionSW Fault Injection SW Fault Injection tools:
FIAT:FIAT: Fault Injection Based Automated Testing Environment, Fault Injection Based Automated Testing Environment, Carnegie Carnegie Mellon University.Mellon University.
EFI, PROFI:EFI, PROFI: Processor Fault Injector,Processor Fault Injector, Dortmund University. Dortmund University. FERRARI:FERRARI: Fault and ERRor Automatic Real-time Injector, Fault and ERRor Automatic Real-time Injector, Texas Texas
University.University. SFI, DOCTOR:SFI, DOCTOR: intergrateD sOftware implemented fault injeCTiOn intergrateD sOftware implemented fault injeCTiOn
enviRonment, enviRonment, Michigan University. Michigan University. FINE:FINE: Fault Injection and moNitoring Environment,Fault Injection and moNitoring Environment, Universidad de Universidad de
Illinois University. Illinois University. FTAPE:FTAPE: Fault Tolerance and Performance Evaluator,Fault Tolerance and Performance Evaluator, Illinois University. Illinois University. XCEPTION:XCEPTION: Coimbra University. Coimbra University.
MAFALDA, MAFALDA-RTMAFALDA, MAFALDA-RT:: Microkernel Assessment by Fault injection Microkernel Assessment by Fault injection AnaLysis and Design AidAnaLysis and Design Aid, LAAS-CNRS en Toulouse, LAAS-CNRS en Toulouse
BALLISTABALLISTA: : Carnegie Mellon University.Carnegie Mellon University.
SEPT’04SEPT’04 WSRS '04 WSRS '04 1111
ToolsTools MicroC/OS-IIMicroC/OS-II RTOS RTOS Infineon C166 Infineon C166 Microcontroller Microcontroller Tasking Tasking Compiler, Debugger.. Compiler, Debugger..
C161C161
C166C166
C163C163
C164C164
C165C165
C167C167
• Robotics
• PLC’s
• Servo-Drives
• Motor Control
• Power-Inverters
• Machine-ToolControl (CNC)
• EngineManagement
• TransmissionControl
• ABS/ASK
• Active Suspension
Automotive Industrial Control
• DVD / CD-ROM
• TV / Monitor
• VCR / SatReceiver
• Set Top Box
• Games
• Video Surveillance
Telecom/ Datacom
• CommunicationBoards (LAN)
• Modems
• PBX
• MobileCommunication
EDP
• Hard Disk Drives
• Tape Drives
• Printers
• Scanners
• Digital Copiers
• FAX Machines
Consumer
Applications for the C166 Family
WDTOSC. PEC
CPUROM /
RAM
PORTS
CAPCOMADCBus
Ext.
Processor -System
Interrupt-System
USART GPTs
Peripheral-System
Flash
Control
X-BusSync Communication PWMPeriphrl.
XRAMXRAM1KByte1KByte
XRAMXRAM1KByte1KByteRAMRAM
1KByte1KByte
RAMRAM1KByte1KByte
PWMPWM
ADCADC
CANCANBUS-BUS-
CONTROLCONTROL
INTERRUPTINTERRUPTUNITUNIT
CAPCOMCAPCOM1+21+2
SSCSSC
USARTUSARTGPTGPT1+21+2
IR+PEC-IR+PEC-CONTROLCONTROL
ROMROM
WDTWDT
CORECORE
Infineon Microcontroller Characteristics:Infineon Microcontroller Characteristics:16 bits High performance16 bits High performanceOn-chip CMOS On-chip CMOS 16.5 MIPS, 25/33 MHz16.5 MIPS, 25/33 MHzAdvantages from CISC & RISCAdvantages from CISC & RISCHigh functionality for peripheralHigh functionality for peripheralTypical for automotiveTypical for automotive
SEPT’04SEPT’04 WSRS '04 WSRS '04 1212
COTS componentsCOTS components The main motivation to use Commercial Off-The-The main motivation to use Commercial Off-The-
Shelf (COTS) components on a system design is Shelf (COTS) components on a system design is the the notorious cost reductionnotorious cost reduction associated to the associated to the final product development. final product development.
The use of COTS components becomes a The use of COTS components becomes a cost-cost-effective methodeffective method for rapid prototyping of complex for rapid prototyping of complex software systems. software systems.
On the other hand, the use of COTS software On the other hand, the use of COTS software components have components have serious certification problemsserious certification problems due to their design process is unknown. due to their design process is unknown.
SEPT’04SEPT’04 WSRS '04 WSRS '04 1313
COTS componentsCOTS components
COTS software is composed of COTS software is composed of general purpose general purpose componentscomponents which have poor dependability which have poor dependability specifications. specifications.
Usually, COTS components are like a Usually, COTS components are like a black-boxblack-box, , the source code is not available and their the source code is not available and their internal architecture (structure and data flow) is internal architecture (structure and data flow) is not adequately documented. not adequately documented.
SEPT’04SEPT’04 WSRS '04 WSRS '04 1414
µC/OS-II Operating SystemµC/OS-II Operating System
Selection came motivated from the perspective that it is a system Selection came motivated from the perspective that it is a system widely used since several years ago. widely used since several years ago.
First Version MicroC/OS 1992
Industrial robots, motor control, medical instruments, etc. Industrial robots, motor control, medical instruments, etc.
It is 99% compliant with the Motor Industry Software Reliability It is 99% compliant with the Motor Industry Software Reliability Association (MISRA) C Coding Standards. Association (MISRA) C Coding Standards.
All Modified Condition Decision Coverage (MCDC) code in All Modified Condition Decision Coverage (MCDC) code in MicroC/OS-II has been removed, improving code quality for RTCA / MicroC/OS-II has been removed, improving code quality for RTCA / EUROCAE DO-178B Level A-certified environments for avionics EUROCAE DO-178B Level A-certified environments for avionics applications.applications.
Validated Software Comp.
SEPT’04SEPT’04 WSRS '04 WSRS '04 1515
µC/OS-II: Characteristics µC/OS-II: Characteristics
Portable: uC/OS-II is written in highly portable ANSI C, with target uC/OS-II is written in highly portable ANSI C, with target microprocessor-specific code written in assembly language. microprocessor-specific code written in assembly language.
ROMable: was designed for embedded applications. This means that if you was designed for embedded applications. This means that if you have the proper tool chain (i.e., C compiler, assembler, and linker/locator), have the proper tool chain (i.e., C compiler, assembler, and linker/locator), you can embed uC/OS-II as part of a product.you can embed uC/OS-II as part of a product.
Scalable: it’s possible to use only the services needed in the application. it’s possible to use only the services needed in the application. This allows to reduce the amount of memory (both RAM and ROM) needed. This allows to reduce the amount of memory (both RAM and ROM) needed. Scalability is accomplished with the use of conditional compilation. Scalability is accomplished with the use of conditional compilation.
Preemptive: uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II always runs the highest priority task that is ready. uC/OS-II always runs the highest priority task that is ready.
Multitasking: uC/OS-II can manage up to 64 tasks; however, the current uC/OS-II can manage up to 64 tasks; however, the current version of the software reserves eight of these tasks for system use. This version of the software reserves eight of these tasks for system use. This leaves your application up to 56 tasks. Each task has a unique priority leaves your application up to 56 tasks. Each task has a unique priority assigned to it, which means that uC/OS-II cannot do round-robin scheduling. assigned to it, which means that uC/OS-II cannot do round-robin scheduling.
Jean J. Labrosse
SEPT’04SEPT’04 WSRS '04 WSRS '04 1616
µC/OS-II: CharacteristicsµC/OS-II: Characteristics
Deterministic: Execution time of all uC/OS-II functions and services are Execution time of all uC/OS-II functions and services are deterministic. You can always know how much time uC/OS-II will take to execute a deterministic. You can always know how much time uC/OS-II will take to execute a function or a service. Further more execution time of all uC/OS-II services do not function or a service. Further more execution time of all uC/OS-II services do not depend on the number of tasks running in your application.depend on the number of tasks running in your application.
Task Stacks: Each task requires its own stack; uC/OS-II allows each task to have a Each task requires its own stack; uC/OS-II allows each task to have a different stack size. This allows you to reduce the amount of RAM needed in your different stack size. This allows you to reduce the amount of RAM needed in your application.application.
Services: system services such as mailboxes, queues, semaphores, fixed-sized system services such as mailboxes, queues, semaphores, fixed-sized memory partitions, time-related functions, etc.memory partitions, time-related functions, etc.
Interrupt Management: Interrupts can suspend the execution of a task. If a higher Interrupts can suspend the execution of a task. If a higher priority task is awakened as a result of the interrupt, the highest priority task will run priority task is awakened as a result of the interrupt, the highest priority task will run as soon as all nested interrupts complete. Interrupts can be nested up to 255 levels as soon as all nested interrupts complete. Interrupts can be nested up to 255 levels deep.deep.
Robust and Reliable: uC/OS-II is based on uC/OS, which has been used in uC/OS-II is based on uC/OS, which has been used in hundreds of commercial applications since 1992.hundreds of commercial applications since 1992.
Jean J. Labrosse
SEPT’04SEPT’04 WSRS '04 WSRS '04 1717
Black-box approachBlack-box approach The aim of study was to use a The aim of study was to use a black-boxblack-box approach for the OS study. approach for the OS study.
So the So the OS source codeOS source code was not modified trying to avoid as was not modified trying to avoid as maximum as possible an intrusion in the OS behaviour. maximum as possible an intrusion in the OS behaviour.
With this objective, a layer named as With this objective, a layer named as Meta-KernelMeta-Kernel, had been , had been developed between the OS and the application to be executed. developed between the OS and the application to be executed.
Through this layer the fault injection was realized in any of the Through this layer the fault injection was realized in any of the parameters of the system calls to measure the parameters of the system calls to measure the OS robustnessOS robustness. .
In black-box testing, input is fed into a program and the output is In black-box testing, input is fed into a program and the output is checked. What goes on inside the program (the checked. What goes on inside the program (the black-boxblack-box) is ) is unimportant. (unimportant. (Voas98))
COTS SW
SEPT’04SEPT’04 WSRS '04 WSRS '04 1818
System DesignSystem Design MicroC/OS-II OS MicroC/OS-II OS
Black-Box
OS Source Code not modified
Injector Injector Layer Layer between the OS and between the OS and the applicationthe application
Injection on the Injection on the parameters of system parameters of system callscalls
SEPT’04SEPT’04 WSRS '04 WSRS '04 1919
Injector AttributesInjector Attributes
Software Fault Injection
Software Fault Injection
ObjectivesObjectives TimeTime FaultsFaults MultiplicityMultiplicity WorkloadWorkload
Fault Prediction
Fault Removal
Fault Prediction
Fault Removal
Pre-runtime
Runtime
Pre-runtime
Runtime
Level
Localization
Persistence
Type
Duration
Level
Localization
Persistence
Type
Duration
Number of simultaneously faults injected each experiment
Number of simultaneously faults injected each experiment
Real Applications BenchmarksSynthetic Programs
Real Applications BenchmarksSynthetic Programs
Injector Attributes:
•Prediction, elimination
•Pre-runtime & Runtime
•High Level
•Transient faults
•Changing of one bit at the system calls (Bit-Flip)
•One fault injected each exp.
•Workload for tool testing
SOFTWARE FAULT INJECTION ATTRIBUTES
SEPT’04SEPT’04 WSRS '04 WSRS '04 2020
Workload DesignWorkload Design
CharacteristicsCharacteristics::
•Maximum system calls consume
•System calls of synchronization, semaphores, memory, queues, messages, tasks handling, Timing management, etc.
•Open module to include calculus.
•Workload for testing the injection tool and the OS
SEPT’04SEPT’04 WSRS '04 WSRS '04 2121
Workload DesignWorkload Design
The system workload was The system workload was continuously runningcontinuously running and and consisted of a series of tasks consisted of a series of tasks executing the application. executing the application.
On the other hand, an On the other hand, an injection agentinjection agent developed developed was in charge of injecting was in charge of injecting faults and invalid values at faults and invalid values at the kernel calls in order to the kernel calls in order to monitor the system monitor the system robustness.robustness.
SEPT’04SEPT’04 WSRS '04 WSRS '04 2222
Errors ClassificationErrors Classification
Errors which could affect the systemErrors which could affect the system Classification related to the detection Classification related to the detection
mechanismsmechanisms Measures about error detection coverage and Measures about error detection coverage and
latency timeslatency times
Events after fault
injection
OS Error codeC167 Error
codeApplication
ErrorNo Error(Correct result)
Others
↓Not Safe Faults(NFS)
System Call not used
System Call used but injection no affects
Detected Errors
After the Fault Injection
SEPT’04SEPT’04 WSRS '04 WSRS '04 2323
Injection ModelInjection Model TheThe faultloadfaultload is the most critical dimension of an OS benchmark is the most critical dimension of an OS benchmark
and more generally of any dependability benchmark. and more generally of any dependability benchmark.
Two techniques for system call parameter corruption could be Two techniques for system call parameter corruption could be used: the ‘used: the ‘bit-flip technique’ consisting in flipping systematically bits ’ consisting in flipping systematically bits of the target parameters of the target parameters
and the ‘and the ‘selective substitution technique’ when invalid data values ’ when invalid data values are introduced in the system call parameters. are introduced in the system call parameters.
Studies have demonstrated the equivalence of the errors provoked Studies have demonstrated the equivalence of the errors provoked by the two techniques [by the two techniques [Dbench02].].
SEPT’04SEPT’04 WSRS '04 WSRS '04 2424
Injection ModelInjection Model
BIT-FLIP BIT-FLIP techniquetechnique It is randomly chosen on It is randomly chosen on
runtime:runtime:
1.1. System callSystem call
2.2. ParameterParameter
3.3. Bit Bit
Consequence of physical Consequence of physical faultsfaults
EMI interferencesEMI interferences Noise Noise Hardware faultsHardware faults ......
SEPT’04SEPT’04 WSRS '04 WSRS '04 2525
Analysis of the obtained resultsAnalysis of the obtained results
•D0: No error, correct output (the fault injection didn’t affect the system).
•D1: Error detected by the operating system (µC/OS-II error code).
•D2: Error detected by the application (the application result was no correct).
•D3: Error which produced the system hangs. (System failure)
•D4: Error detected by the microcontroller.
•Codification of the different output values:
SEPT’04SEPT’04 WSRS '04 WSRS '04 2626
Analysis of the obtained resultsAnalysis of the obtained results
D4
D3
D2
D1
D0
DETECC
D4D3D2D1D0
Po
rce
nta
je
70
60
50
40
30
20
10
0
DETECC
756 65,7 65,7 65,7
241 21,0 21,0 86,7
23 2,0 2,0 88,7
101 8,8 8,8 97,5
29 2,5 2,5 100,0
1150 100,0 100,0
D0
D1
D2
D3
D4
Total
VálidosFrecuencia Porcentaje
Porcentajeválido
Porcentajeacumulado
Complete System (Complete System (µC/OS-II + MicroµC/OS-II + Micro)::
C cs = D0 + D1 + D2 + D4 = C cs = D0 + D1 + D2 + D4 = 65,7 + 21 + 2 + 2,5 = 91,2 %65,7 + 21 + 2 + 2,5 = 91,2 %
Operating System ( Operating System ( µC/OS-IIµC/OS-II ): ):
C C OSOS = D0 + D1 =86,7 % = D0 + D1 =86,7 %
CoverageCoverage::[Powell95, Constantinescu95]
SEPT’04SEPT’04 WSRS '04 WSRS '04 2727
Analysis of the obtained resultsAnalysis of the obtained results
Error detection Error detection latencieslatencies
Time between the injection and Time between the injection and detection by the OSdetection by the OS
Mean value obtained 304 Mean value obtained 304 μμss
One built-in timer of the One built-in timer of the microcontroller to measure microcontroller to measure latencieslatencies
High precisionHigh precision
Descriptivos
,30422573 1,97E-02
,26533773
,34311372
,27924537
,12800000
9,392E-02
,30646466
,102400
,972800
,870400
,49920000
1,213 ,157
-,287 ,312
Media
Límite inferior
Límite superior
Intervalo de confianzapara la media al 95%
Media recortada al 5%
Mediana
Varianza
Desv. típ.
Mínimo
Máximo
Rango
Amplitud intercuartil
Asimetría
Curtosis
LATENCEstadístico Error típ.
241N =
LATENC
1,2
1,0
,8
,6
,4
,2
0,0
SEPT’04SEPT’04 WSRS '04 WSRS '04 2828
Other ResultsOther Results
Frequency tables about Frequency tables about the most typical the most typical error error codescodes given by the OS given by the OS
Valid data Frequency Percentage Accumulative percentage
Error Code
E1 111 41,1 41,1 OS_ERR_EVENT_TYPE
E11 14 5,2 46,3 OS_MEM_INVALID_PART
E40 8 3,0 49,3 OS_TASK_DEL_ERR
E41 3 1,1 50,4 OS_PRIO_ERR
E42 69 25,6 75,9 OS_PRIO_INVALID
E60 13 4,8 80,7 OS_TASK_DEL_ERR
E81 11 4,1 84,8 OS_TIME_INVALID_MINUTES
E82 2 0,7 85,6 OS_TIME_INVALID_SECONDS
E83 10 3,7 89,3 OS_TIME_INVALID_MILLI
Ex 29 10,7 100,0 NO CODE
Total 270 100,0
‘E1’ was the most typical. This error is the ‘OS_ERR_EVENT_TYPE’. This error was produced when the fault was injected in some semaphore, message queue or mailbox. The system reacted going to a hanging state.
Secondly, the error code ‘E42’ related with the ‘OS_PRIO_INVALID’ was obtained when the injection was at system calls about task management.
SEPT’04SEPT’04 WSRS '04 WSRS '04 2929
Other ResultsOther Results
31 31
4 4
5 5
19 19
9 3 12
1 1
5 5
1 1
6 6
2 2
29 5 34
4 4
19 19
32 32
5 5
23 23
14 14
14 14
29 29
4 4
4 4
2 2
8 5 19 9 4 5 9 29 4 28 32 5 23 45 14 29 2 270
LL1
LL10
LL13
LL15
LL16
LL17
LL18
LL19
LL20
LL21
LL22
LL23
LL24
LL28
LL3
LL30
LL4
LL5
LL50
LL6
LL8
LL9
LLAMAD
Total
LL10 LL13 LL15 LL16 LL17 LL18 LL20 LL22 LL23 LL24 LL28 LL3 LL30 LL4 LL5 LL50 LL9
PROPAG
Total
Moreover, after the injection campaigns it was possible to see how errors were propagated through the system. It was registered the corrupted system call and later which was the system call who finally detected the error, taking the time employed for the system to detect this situation.
Error Propagation
SEPT’04SEPT’04 WSRS '04 WSRS '04 3030
Other ResultsOther Results To finish, results on which were the To finish, results on which were the most critical system calls were were
obtained with the aim to improve their robustness and of course the obtained with the aim to improve their robustness and of course the final OS dependability. final OS dependability.
For example, there are some data structures, related with the For example, there are some data structures, related with the event control block, in which the injection produced a lot of failures and the , in which the injection produced a lot of failures and the most of times the system hanged. most of times the system hanged.
This is due to in these structures is stored the This is due to in these structures is stored the list of tasks waiting for some event, so if the injection corrupts that information, the system , so if the injection corrupts that information, the system loss the sequence of the next actions and goes to a non safe state loss the sequence of the next actions and goes to a non safe state without knowing how to react (without knowing how to react (the system hangs). ).
This give us information on where dedicate special attention due to This give us information on where dedicate special attention due to an error on those data structures could provoke an error on those data structures could provoke critical failures on on the system.the system.
SEPT’04SEPT’04 WSRS '04 WSRS '04 3131
Conclusions Conclusions After the experiments, the error detection coverage, error detection After the experiments, the error detection coverage, error detection
latency times, error propagation, typical OS error codes, etc. have latency times, error propagation, typical OS error codes, etc. have been obtained. been obtained.
Fault injection into the Fault injection into the code and data memory segments of the segments of the microkernel will be implemented too. microkernel will be implemented too.
About possible improvements for the MicroC/OS-II to increase its About possible improvements for the MicroC/OS-II to increase its dependability should take into account, that some detected errors in dependability should take into account, that some detected errors in certain certain data structures could provoke critical failures on the system. could provoke critical failures on the system.
These detected data structures should implement some mechanism These detected data structures should implement some mechanism to protect the information they host.to protect the information they host.
SEPT’04SEPT’04 WSRS '04 WSRS '04 3232
Future ResearchFuture Research In a next research work, these data have to be In a next research work, these data have to be
compared with compared with other COTS RTOS working under the working under the same conditions. same conditions.
RT-fault injector to minimize intrusionRT-fault injector to minimize intrusion((Without internal debug support, intrusion > 0)Without internal debug support, intrusion > 0)
Nexus-implemented fault injection-implemented fault injection Other architecture: Motorola MPC565Other architecture: Motorola MPC565 Intrusion -----> nullIntrusion -----> null Preliminary resultsPreliminary results Better controllability and observability Better controllability and observability Best option to validate RTOS and applicationsBest option to validate RTOS and applications
SEPT’04SEPT’04 WSRS '04 WSRS '04 3333
Contact DataContact Data
Juan Pardo
Fault Tolerant Systems GroupFault Tolerant Systems GroupPolytechnic University of Valencia Polytechnic University of Valencia Spain Spain
EmailEmail: : [email protected]
WebWeb: : http://www.disca.upv.es/gstf/