failure mode and effects analysis of software-based

FAILURE MODE ANDEFFECTS ANALYSISOF SOFTWARE-BASEDAUTOMATION SYSTEMSHaapanen Pentti, Helminen Atte

(VTT Industrial Systems)

In STUK this study was supervised by Marja-Leena Järvinen

STUK-YTO-TR 190 / A U G U S T 2 0 0 2

STUK • SÄTEILYTURVAKESKUSSTRÅLSÄKERHETSCENTRALEN

RADIATION AND NUCLEAR SAFETY AUTHORITY

Osoite/Address • Laippatie 4, 00880 HelsinkiPostiosoite / Postal address • PL / P.O.Box 14, FIN-00881 Helsinki, FINLANDPuh./Tel. (09) 759 881, +358 9 759 881 • Fax (09) 759 88 500, +358 9 759 88 500 • www.stuk.fi

ISBN 951-712 -584 -4 (p r in t )ISBN 951-712 -585 -2 (pdf )ISSN 0785-9325

Dark Oy, Vantaa /Fin land 2002

The conc lus ions presented in the STUK repor t ser ies a re those of the authorsand do not necessar i l y represent the o f f ic ia l pos i t ion o f STUK

S T U K - Y TO - T R 1 9 0

3

Abstract

Failure mode and effects analysis (FMEA) is one of the well-known analysis methodshaving an established position in the traditional reliability analysis. The purpose of FMEAis to identify possible failure modes of the system components, evaluate their influences onsystem behaviour and propose proper countermeasures to suppress these effects. Thegeneric nature of FMEA has enabled its wide use in various branches of industry reachingfrom business management to the design of spaceships. The popularity and diverse use ofthe analysis method has led to multiple interpretations, practices and standards present-ing the same analysis method.

FMEA is well understood at the systems and hardware levels, where the potential failuremodes usually are known and the task is to analyse their effects on system behaviour.Nowadays, more and more system functions are realised on software level, which hasaroused the urge to apply the FMEA methodology also on software based systems. Soft-ware failure modes generally are unknown—“software modules do not fail, they onlydisplay incorrect behaviour”—and depend on dynamic behaviour of the application. Thesefacts set special requirements on the FMEA of software based systems and make it diffi-cult to realise.

In this report the failure mode and effects analysis is studied for the use of reliabilityanalysis of software-based systems. More precisely, the target system of FMEA is definedto be a safety-critical software-based automation application in a nuclear power plant,implemented on an industrial automation system platform. Through a literature study thereport tries to clarify the intriguing questions related to the practical use of softwarefailure mode and effects analysis.

The study is a part of the research project “Programmable Automation System SafetyIntegrity assessment (PASSI)”, belonging to the Finnish Nuclear Safety Research Pro-gramme (FINNUS, 1999–2002). In the project various safety assessment methods andtools for software-based systems are developed and evaluated. The project is financedtogether by the Radiation and Nuclear Safety Authority (STUK), the Ministry of Tradeand Industry (KTM) and the Technical Research Centre of Finland (VTT).

HAAPANEN Pentti, HELMINEN Atte (VTT Industrial Systems). Failure mode and effects analysis ofsoftware-based automation systems. STUK-YTO-TR 190. Helsinki 2002. 35 pp + Appendices 2 pp.

Keywords: safety, safety analysis, reliability analysis, automation, programmable systems, software-based systems, reactor protection systems, nuclear reactor safety, failure mode and effects analysis

4

S T U K - Y TO - T R 1 9 0

Tiivistelmä

Vika- ja vaikutusanalyysi (VVA) on tunnettu analyysimenetelmä, jolla on vakiintunutasema perinteisissä luotettavuusanalyyseissä. VVA:n tavoitteena on tunnistaa järjestel-män komponennttien mahdolliset vikantumistavat, arvioida niiden vaikutuksia järjestel-män käyttäytymiseen ja ehdottaa sopivia vastatoimenpiteitä haitallisten vaikutustenestämiseksi. VVA:n yleispätevä luonne on mahdollistanut sen soveltamiseen mitä moni-naisimpiin kohteisiin ulottuen liiketoiminnan hallinnasta avaruusalusten suunniteluun.Menetelmän laaja suosio ja erilaiset käyttötavat ovat johtaneet useisiin erilaisiin menette-lyä koskeviin tulkintoihin, käytäntöihin ja standardien syntyyn.

VVA hallitaan hyvin järjestelmä- ja laitetasoilla, joilla mahdolliset vikaantumistavatyleensä tunnetaan ja tehtävänä on analysoida niiden vaikutuksia järjestelmän käyttäyty-miseen. Nykyään yhä suurempi osa järjestelmien toiminnoista toteutetaan ohjelmistota-solla, mikä on herättänyt halun soveltaa VVA metodologiaa myös ohjelmoitaviin järjestel-miin. Ohjelmistojen vikaantumistapoja ei yleensä tunneta – ”ohjelmistot eivät vikaannu,ne vain voivat käyttäytyä ennakoimattomalla tavalla” – ja ne riippuvat sovelluksendynaamisesta käyttäytymisestä.

Tässä raportissa on selvitelty vika- ja vaikutusanalyysin soveltamista ohjelmoitaviinjärjestelmiin. Tarkemmin sanottuna analyysin kohteena on ajateltu olevan teolliseenautomaatiojärjestelmään implementoitu ydinvoimalaitoksen turvallisuuskriittinenautomaatiosovellus. Lähinnä kirjallisuuden perusteella on yritetty selvittää ohjelmoita-vaan tekniikkaan liittyviä vika- ja vaiktusanalyysin erityisongelmia.

Tutkimus on ollut osa Suomen kansalliseen ydinturvallisuustutkimusohjelmaan(FINNUS 1999–2002) kuuluuvaa ”Ohjelmoitavan automaation turvallisuuden arviointi(PASSI)”-projektia. Projektissa on kehitetty ja arvioitu erilaisia ohjelmoitavien järjestel-mien turvallisuuden arviointimenetelmiä. Projektia ovat rahoittaneet yhdessä Säteilytur-vakeskus (STUK), Kauppa- ja teollisuusministeriö (KTM) ja Valtion teknillinen tutkimus-keskus (VTT).

HAAPANEN Pentti, HELMINEN Atte (VTT Tuotteet ja tuotanto). Ohjelmoitavien automaatio-järjestelmien vikaantumis- ja vaikutusanalyysi. STUK-YTO-TR 190. Helsinki 2002. 35 s. + liitteet 2 s.

Avainsanat: turvallisuus, turvallisuusanalyysi, luotettavuusanalyysi, automaatio, ohjelmoitavatjärjestelmät, reaktorin suojausjärjestelmät, reaktoriturvallisuus, vika- ja vaikutusanalyysi

S T U K - Y TO - T R 1 9 0

5

Contents

ABSTRACT 3

TIIVISTELMÄ 4

DEFINITIONS 6Terms 6Abbreviations 9

1 INTRODUCTION 111.1 Overview 111.2 Failure Mode and Effects Analysis 111.3 Software Failure Modes and Effects Analysis 13

2 SURVEY OF LITERATURE 15

3 FMEA STANDARDS 203.1 Overview 203.2 IEC 60812 203.3 MIL-STD-1629A 203.4 SAE J-1739 20

4 SWFMEA PROCEDURE 214.1 Level of analysis 224.2 Failure modes 234.3 Information requirements 244.4 Criticality analysis 25

5 CONCLUSIONS 26

BIBLIOGRAPHY 28Standards & guidelines 35

APPENDIX 1 EXAMPLE OF THE FMEA-WORKSHEET (IEC 60812) 36

APPENDIX 2 AN EXAMPLE OF A SWFMEA FORM (TÜV NORD) 37

6

S T U K - Y TO - T R 1 9 0

Terms

Analysis approachVariations in design complexity and availabledata will generally dictate the analysis ap-proach to be used. There are two primaryapproaches for the FMECA. One is the hard-ware approach that lists individual hardwareitems and analyzes their possible failuremodes. The other is the functional approachthat recognizes that every item is designed toperform a number of outputs. The outputs arelisted and their failures analyzed. For morecomplex systems, a combination of the func-tional and hardware approaches may be con-sidered.

Block diagramsBlock diagrams that illustrate the operation,interrelationships, and interdependencies ofthe functions of a system are required to showthe sequence and the series dependence orindependence of functions and operations.Block diagrams may be constructed in con-junction with, or after defining the system andshall present the system breakdown of itsmajor functions. More than one block diagramis sometimes required to represent alternativemodes of operation, depending upon the defi-nition established for the system.

Compensating provisionActions that are available or can be taken byan operator to negate or mitigate the effect ofa failure on a system.

ContractorA private sector enterprise engaged to provideservices or products within agreed limits spec-ified by a procuring activity.

Corrective actionA documented design, process, procedure, ormaterials change implemented and validatedto correct the cause of failure or design defi-ciency.

CriticalityA relative measure of the consequences of afailure mode and its frequency of occurrences.

Criticality analysis (CA)A procedure by which each potential failuremode is ranked according to the combinedinfluence of severity and probability of occur-rence.

Damage effectsThe result(s) or consequence(s) a damagemode has upon the operation, function, orstatus of a system or any component thereof.Damage effects are classified as primary dam-age effects and secondary damage effects.

Damage modeThe manner by which damage is observed.Generally describes the way the damage oc-curs.

Definitions

S T U K - Y TO - T R 1 9 0

7

Damage mode and effects analysis (DMEA)The analysis of a system or equipment con-ducted to determine—the extent of damagesustained from given levels of hostile damagemechanisms and the effects of such damagemodes on the continued controlled operation—and mission completion capabilities of the sys-tem or equipment.

Design data and drawingsDesign data and drawings identify each itemand the item configuration that perform eachof the system functions. System design dataand drawings will usually describe the sys-tem’s internal and interface functions begin-ning at system level and progressing to thelowest indenture level of the system. Designdata will usually include either functionalblock diagrams or schematics that will facili-tate construction of reliability block diagrams.

DetectionDetection is the probability of the failure beingdetected before the impact of the effect isrealized.

Detection mechanismThe means or method by which a failure canbe discovered by an operator under normalsystem operation or can be discovered by themaintenance crew by some diagnostic action.

EnvironmentsThe conditions, circumstances, influences,stresses and combinations thereof, surround-ing and affecting systems or equipment duringstorage, handling, transportation, testing, in-stallation, and use in standby status and mis-sion operation.

FailureDeparture of a system from its required be-haviour; failures are problems that users orcustomers see.

Failure causeThe physical or chemical processes, designdefects, part misapplication, quality defects,part misapplication, or other processes whichare the basic reason for failure or which initi-ate the physical process by which deteriora-tion proceeds to failure.

Failure definitionThis is a general statement of what consti-tutes a failure of the item in terms of perform-ance parameters and allowable limits for eachspecified output.

Failure effectThe consequence(s) a failure mode has on theoperation, function, or status of an item. Fail-ure effects are classified as local effect, nexthigher level, and end effect.

Failure modeThe manner by which a failure is observed.Generally describes the way the failure occursand its impact on equipment operation.

Failure mode and effects analysis (FMEA)A procedure by which each potential failuremode in a system is analyzed to determine theresults or effects thereof on the system and toclassify each potential failure mode accordingto its severity.

FMECA—Maintainability informationA procedure by which each potential failure isanalyzed to determine how the failure is de-tected and the actions to be taken to repair thefailure.

FMECA planningPlanning the FMECA work involves the con-tractor’s procedures for implementing theirspecified requirements. Planning should in-clude updating to reflect design changes andanalysis results. Worksheet formats, groundrules, assumptions, identification of the levelof analysis, failure definitions, and identifica-tion of coincident use of the FMECA by thecontractor and other organizational elementsshould also be considered.

8

S T U K - Y TO - T R 1 9 0

Functional approachThe functional approach is normally usedwhen hardware items cannot be uniquelyidentified or when system complexity requiresanalysis from the top down.

Functional block diagramsFunctional block diagrams illustrate the oper-ation and interrelationships between function-al entities of a system as defined in engineer-ing data and schematics.

Ground rules and assumptionsThe ground rules identify the FMECA ap-proach (e.g., hardware, functional or combina-tion), the lowest level to be analyzed, andinclude statements of what might constitute afailure in terms of performance criteria. Everyeffort should be made to identify and recordall ground rules and analysis assumptionsprior to initiation of the analysis; however,ground rules and analysis assumptions maybe adjusted as requirements change.

Hardware approachThe hardware approach is normally used whenhardware items can be uniquely identifiedfrom schematics, drawings, and other engi-neering and design data. This approach isrecommended for use in a part level up ap-proach often referred to as the bottom-upapproach.

Indenture levelsThe item levels which, identify or describerelative complexity of assembly or function.The levels progress from the more complex(system) to the simpler (part) divisions.

Initial indenture levelThe level of the total, overall item which is thesubject of the FMECA.

Other indenture levelsThe succeeding indenture levels (second, third,fourth, etc) which represent an orderly pro-gression to the simpler division of the item.

InterfacesThe systems, external to the system beinganalyzed, which provide a common boundaryor service and are necessary for the system toperform its mission in an undegraded mode;for example, systems that supply power, cool-ing, heating, air services, or input signals.

Level of analysisThe level of analysis applies to the systemhardware or functional level at which failuresare postulated. In other words, how the systembeing analyzed is segregated (e.g., a section ofthe system, component, sub-component, etc.).

OccurrenceRating scale that shows the probability orfrequency of the failure.

Reliability block diagramsReliability block diagrams define the seriesdependence, or independence, of all functionsof a system or functional group for each life-cycle event.

Risk Priority Number (RPN)The risk priority number is usually calculatedas the product of the severity, occurrence, anddetection.

SeverityThe consequences of a failure as a result of aparticular failure mode. Severity considers theworst potential consequence of a failure, deter-mined by the degree of injury, property dam-age, or system damage that could ultimatelyoccur.

Single failure pointThe failure of an item that would result infailure of the system and is not compensatedfor by redundancy or alternative operationalprocedure.

Threat mechanismThe means or methods which are embodied oremployed as an element of a man-made hostileenvironment to produce damage effects on asystem and its components.

S T U K - Y TO - T R 1 9 0

9

Trade-off study reportsThese reports should identify areas ofmarginal and state–of–the–art design andexplain any design compromises and operatingrestraints agreed upon. This information willaid in determining the possible and mostprobable failure modes and causes in thesystem.

Undetectable failureA postulated failure mode in the FMEA forwhich there is no failure detection method bywhich the operator is made aware of thefailure.

Abbreviations

ACE Abnormal Condition or Event

AIAG Automotive Industry Action Group

BDA Bi-directional Analysis

CCA Cause-Consequence Analysis

CHA Component Hazard Analysis

COTS Commercial Off-the-Shelf

DFMEA Design Failure Modes and EffectsAnalysis

DCD Data Context Diagram

DFD Data Flow Diagram

ETA Event Tree Analysis

FMEA Failure Modes and Effects Analysis

FMECA Failure Modes, Effects, and CriticalityAnalysis

FPA Function Point Analysis

FPTN Failure Propagation and Transforma-tion Notation

FSE Functions, Systems and Equipment

FTA Fault Tree Analysis

HA Hazard Analysis

HAZOP Hazard and Operability Analysis

IEC International Electrotechnical Com-mission

IEEE Institute of Electronic and ElectricalEngineers

ISO International Standards Organisation

MOD Ministry of Defence (UK)

O&SHA Operating and Support HazardAnalysis

PHA Preliminary Hazard Analysis

PHL Preliminary Hazard List

RCA Root Cause Analysis

RPN Risk Priority Number

SAD Software Architecture Description

SAE Society of Automotive Engineers

SCA Sneak-Circuit Analysis

SDD Software Design Description

SFMEA Software Failure Modes and EffectsAnalysis (SW FMEA)

SFMECA Software Failure Modes, Effects, andCriticality Analysis (SW FMECA)

SFTA Software Fault Tree Analysis

SHA Software Hazard Analysis

SRS Software Requirements Specification

SSCA Software Sneak Circuit Analysis(SSCA)

SSP Software Safety Plan

SSPP System Safety Program Plan

SSSP System Software Safety Process

SwHA Software Hazard Analysis

ZNA Zonal Hazard Analysis

S T U K - Y TO - T R 1 9 0

11

1.1 OverviewA failure mode and effects analysis (FMEA) can bedescribed as a systematic way of identifying fail-ure modes of a system, item or function, and eval-uating the effects of the failure modes on the high-er level. The objective is to determine the causesfor the failure modes and what could be done toeliminate or reduce the chance of failure. A bot-tom-up technique such as FMEA is an effectiveway to identify component failures or system mal-functions, and to document the system under con-sideration.

The FMEA discipline was originally developedin the United States Military. Military procedureMIL-P-1629, titled procedures for performing afailure mode, effects and criticality analysis, isdated on November 9th, 1949. The method wasused as a reliability evaluation technique to deter-mine the effect of system and equipment failures.Failures were classified according to their impacton the military mission success and personnel/equipment safety. The concept that personnel andequipment are interchangeable does not apply forexample in the modern manufacturing context ofproducing consumer goods and therefore the man-ufacturers in different areas of industries haveestablished new sets of priorities, guidelines andstandards of their own. However, military proce-dure MIL-P-1629 has functioned as a model forlatter military standards MIL-STD-1629 and MIL-STD-1629A, which illustrate the most widely usedFMEA procedures.

Outside the military the formal application ofFMEA was first adopted to the aerospace industry,where FMEA was already used during the Apollomissions in the 1960’s. In the early 1980’s, UnitedStates automotive companies began to formallyincorporate FMEA into their product developmentprocess. A task force representing Chrysler Corpo-ration, Ford Motor Company and General Motors

1 Introduction

Corporation developed QS 9000 standard in aneffort to standardise supplier quality systems.QS 9000 is the automotive analogy to betterknown standard ISO 9000. QS 9000 compliant au-tomotive suppliers must utilise FMEA in the ad-vanced quality planning process and in the devel-opment of their quality control plans. The effortmade by the task force led to an industry-wideFMEA standard SAE J-1739 issued by the Societyof Automotive Engineers’ in 1994.

Academic discussion on FMEA originates fromthe 1960’s when studies of component failureswere broadened to include the effects of compo-nent failures on the system of which they were apart. One of the earliest descriptions of a formalapproach for performing a FMEA was given at theNew York Academy of Sciences (see Coutinho,1964). In the late 1960’s and early 1970’s severalprofessional societies published formal proceduresfor performing the analysis. The generic nature ofthe method assisted the rapid broadening ofFMEA to different application areas and variouspractices fundamentally using the same analysismethod were created. Along with the digital revo-lution the FMEA was applied in the analysis ofsoftware-based systems and one of the first arti-cles regarding to software failure mode and effectsanalysis (SWFMEA) was given in Reifer (1979).Even thought there is no explicit standard forSWFMEA, the standard IEC 60812 published in1985 is often referred when carrying out FMEAfor software-based systems. Closer review onstandards related to FMEA and a literature sur-vey on the topics are given in Chapters 2 and 3.

1.2 Failure Mode and Effects AnalysisA systematic thinking promoted by FMEA is rele-vant when a new product or system is developed.FMEA seeks for answer for questions like: whatcould go wrong with the system or process in-

12

S T U K - Y TO - T R 1 9 0

volved in creating the system; how badly might itgo wrong; and what needs to be done to preventfailures? Kennedy (1998) lists the purposes ofFMEA as follows:• Identify potential design and process related

failure modes. Ideally, the design or process ischanged to remove potential problems in theearly stages of development (Pries, 1998).

• Find the effects of the failure modes. Russo-manno, Bonnell, and Bowles (1994) noted thatFMEA allows a team to analyse the effect ofeach failure on system operation.

• Find the root causes of the failure modes.Barkai (1999) stated that an FMEA is designedto find the sources of the failures of a system.

• Prioritise recommended actions using the riskpriority number. Kennedy noted that the riskpriority number is computed using the proba-bility of occurrence of the failure mode, theseverity of the effect of the failure mode, andthe probability of detection of the failure modethrough manufacturing. Loch (1998) pointedout that FMEA could provide a prioritised listof potential failures.

• Identify, implement, and document the recom-mended actions. Pries noted that FMEA docu-ments should be under formal document con-trol. Kennedy stated that the recommendedactions are to address failure modes with rank-ings that are considered unacceptable.

Other authors have presented some specificationsand additional purposes for FMEA such as:• Assess the safety of various systems and com-

ponents (Russomanno, Bonnell, & Bowles,1994). Pries (1998) pointed out that hazardoussituations could be identified early in the de-velopment process. The developers can analysethe hazardous situation, the causes of the haz-ardous situation, and solutions to manage thehazardous situation.

• Highlight any significant problems with a de-sign (Barkai, 1999). If feasible, the team modi-fies the design to avoid those problems. John-son (1998) stated that FMEA could avoid ex-pensive modifications to designs by identifyingpotential failures and preventing them.

• Serve as a prelude to test design (Pries, 1998).Pries noted that the software design FMEA is alisting of potential problems and is thus alisting of test cases. Pries pointed out that anyeffort put into the FMEA would be helpful intest case development. Pries stated that thetest descriptions developed from the FMEAwould challenge the software product at itsweakest points by testing for anomalies.

In practice the answers are searched in FMEAthrough an iterative analysis process, for whichthe main phases are illustrated in Fig. 1. The anal-ysis process starts from the identification of thescope of the system and the functions the FMEA isto be applied on. The development process flowcharts and design plans of the system are used tosupport the identification. After the subject for theFMEA is confirmed the next step is to identify thepotential failure modes in a gradual way. The tech-nique of brainstorming has often proven to be auseful method for finding failure modes. There ex-

Figure 1. Main phases of FMEA.

IDENTIFICATION

OF SYSTEM AND

FUNCTIONS

IDENTIFICATION

OF FAILURE MODES

DETERMINATION

OF EFFECTS OF

FAILURE MODES

IDENTIFICATION

OF POSSIBLE

CAUSES

DOCUMENTATION

AND RISK

REDUCTION

S T U K - Y TO - T R 1 9 0

13

ists many different worksheets to support thebrainstorm procedure and the documentation ofFMEA overall. In the following phases the effectsand causes of potential failures are determined. Socalled cause and effect diagrams can be used tohelp in these phases. The final step is to documentthe process and take actions to reduce the risksdue to the identified failure modes.

The FMEA can generally be classified as eithera product FMEA or a process FMEA dependingupon the application. The product FMEA analysesthe design of a product by examining the way thatitem’s failure modes affect the operation of theproduct. The process FMEA analyses the process-es involved in design, building, using, and main-taining a product by examining the way thatfailures in the manufacturing or service processesaffect on the operation of the product. Some sourc-es refer to a design and a process FMEA instead ofa product and a process FMEA, however, bothFMEA types focus on design – design of theproduct or design of the process – and thereforelatter terminology is used in the text. The classifi-cation and different categories of failure mode andeffects analysis of the product and process FMEAare presented in Fig. 2.

For software-based systems both the productand process FMEA are appropriate. In the processFMEA the production of the hardware of a soft-ware-based system may involve chemical process-es, machining operations, and the assembly ofsubsystem components. The software productionincludes the production routines reaching fromrequirement specification to the final testing ofthe software. Use includes all of the ways a prod-uct may be used; it includes operator or other

human interfaces, effects of over-stress conditions,and possible misuses of the system. Maintenanceincludes preventive and corrective maintenanceas well as configuration control.

The product FMEA, which can be described asapplying to the hardware or software of the prod-uct, or to the timing and sequencing of varioussystem operations. Hardware includes, for exam-ple, a system’s electrical, mechanical, and hydrau-lic subsystems and the interfaces between thesesubsystems. Software includes programs and theirexecution as tasks that implement various systemfunctions. Software also includes the program in-terfaces with the hardware and those betweendifferent programs or tasks. Software can be em-bedded as a functional component in a self-con-tained system or executed on a general-purposecomputer.

1.3 Software Failure Modes andEffects AnalysisPerforming FMEA for a mechanical or electricalsystem is usually more straightforward operationthan what it is for a software-based system. Fail-ure modes of components such as relays and resis-tors are generally well understood. Reasons forthe component failures are known and their con-sequences may be studied. Mechanical and electri-cal components are supposed to fail, due to somereason as wearing, ageing or unanticipated stress.The analysis may not always be easy, but at least,the safety engineers can rely on data provided bythe component manufacturers, results of tests andfeedback of available operational experience. Forsoftware-based systems the situation is different.The failure modes of software are generally un-

Figure 2. Types of FMEA (SAG, 1996).

-Program/

Task

-HW interfaces

-SW interfaces

Types of FMEA

Product Process

Hardware Software Timing/

Sequencing

Production Maintenance Use

-Electrical

-Mechanical

-Interfaces

-Assembly

-Chemical

-Machining

-Software

-Configuration

control

-Maintenance

operations

-Documentation

-Training

-Modes

-Human interface

-Overstress

-User profiles

-Documentation

-Training

14

S T U K - Y TO - T R 1 9 0

known. The software modules do not fail they onlydisplay incorrect behaviour. To find out this incor-rect behaviour the safety engineer has to apply hisown knowledge about the software to set up anappropriate FMEA approach.

The main phases of SWFMEA are similar tothe phases shown in Figure 1. The performer ofSWFMEA has to find out the appropriate startingpoint for the analyses, set up a list of relevantfailure modes and understand what makes thosefailure modes possible and what are their conse-quences. The failure modes in SWFMEA should beseen in a wide perspective reflecting the failuremodes of incorrect behaviour of the software asmentioned above and not for example just as typosin the software code. The failure mode and effectsanalysis for hardware or software has certaindistinguishing characteristics. Ristord et al. (2001)lists these characteristics as following:

Hardware FMEA:• May be performed at functional level or part

level.• Applies to a system considered as free from

failed components.• Postulates failures of hardware components

according to failure modes due to ageing, wear-ing or stress.

• Analyses the consequences of theses failures atsystem level.

• States the criticality and the measures takento prevent or mitigate the consequences.

Software FMEA:• Is only practicable at functional level.• Applies to a system considered as containing

software faults which may lead to failure undertriggering conditions.

• Postulates failures of software components ac-cording to functional failure modes due to po-tential software faults.

• Analyses the consequences of these failures atsystem level.

• States the criticality and describes the meas-ures taken to prevent or mitigate the conse-quences. Measures can, for example, show thata fault leading to the failure mode will benecessarily detected by the tests performed onthe component, or demonstrate that there is nocredible cause leading to this failure mode dueto the software design and coding rules ap-plied.

After the short historical review and general over-view to the methodology of failure mode and ef-fects analysis given in this chapter the structureof the report is as described below. In Chapter 2, aliterature survey on the topic is carried out basedon published information available. The key stand-ards related to failure mode and effects analysisare covered in Chapter 3. In Chapter 4 the specialfeatures of the failure mode and effects analysisapplied on software-based systems are discussed.Main conclusions on the topic are presented inChapter 5.

S T U K - Y TO - T R 1 9 0

15

There is relatively little information published onthe use of FMEA for information systems (Baner-jee, 1995). Goddard (2000) noted that techniquesfor applying software FMEA to systems duringtheir design have been largely missing from theliterature. He pointed out that the number of pa-pers dedicated to software FMEA has remainedsmall. On the other hand, no papers have beenwritten reporting on particular projects whereFMEA were used unsuccessfully. The followingsurvey has mainly been adapted from Signor(2000).

FMEA is a tool for identifying, analysing andprioritising failures. Pfleeger (1998) defined a fail-ure as the departure of a system from its requiredbehavior; failures are problems that users or cus-tomers see. Loch (1998) noted that potential fail-ures may concern functionality, reliability, ease ofrepair, process efficiency, choice of manufacturingmethod, ease of detection, choice of tools, or thedegree to which machines are used. Reifer (1979)provided several examples of software failures. Anaircraft autoland system may fail because of aconvergence in its vectoring algorithms duringextreme operating conditions. A safety systemused to monitor a nuclear power plant may have alogic error that allows a failure to dampen anoverloaded reactor to go undetected. An error in aspacecraft program for controlling re-entry anglecould cause skip-out and loss of mission.

FMEA provides a framework for a detailedcause and effect analysis (Russomanno, Bonnell, &Bowles, 1994). FMEA requires a team to thorough-ly examine and quantify the relationships amongfailure modes, effects, causes, current controls,and recommended actions. Lutz and Woodhouse(1996) defined a failure mode as the physical orfunctional manifestation of a failure. Luke (1995)noted that software requirements not being metare failure modes. An effect in FMEA is the

consequence of the failure mode in terms of opera-tion, function, or status of the system (Bull, Bur-rows, Edge, Hawkins, and Woollons, 1996). Thecauses column in the FMEA worksheet containsthe principal causes associated with each postulat-ed failure mode (Onodera, 1997). Stamatis (1995)stated that current controls exist to prevent thecause of the failure from occurring. Bluvband andZilberberg (1998) pointed out that current controlsare used to detect failures. Controls include warn-ing lights, gages, filters, and preventive actionssuch as design reviews and operator training. Therecommended action in an FMEA may be a specif-ic action or further study to reduce the impact of afailure mode (Stamatis, 1995).

By multiplying the values for severity, occur-rence, and detection, the team obtains a riskpriority number. Severity is the FMEA ratingscale that shows the seriousness of the effect ofthe failure (McDermott, Mikulak, & Beauregard,1996). Occurrence is the FMEA rating scale thatshows the probability or frequency of the failure.Detection is the probability of the failure beingdetected before the impact of the effect is realized.Risk priority numbers help the team to select therecommended actions with the most impact.

Ease of UseFMEA has many–to–many relationships amongfailure modes, effects, causes, current controls, andrecommended actions (Bluvband & Zilberberg,1998). A significant FMEA application soon bogsdown in confusion due to large amounts of repeti-tive and redundant data. This problem is typical ofFMEA applications. Montgomery, Pugh, Leedham,and Twitchett (1996) noted that entries in theFMEA worksheet are voluminous and, as a result,very brief. They pointed out that these numerousbrief entries make FMEA reports difficult to pro-duce, difficult to understand, and difficult to main-

2 Survey of literature

16

S T U K - Y TO - T R 1 9 0

tain. Passey (1999) pointed out that although someFMEA effects arise repeatedly, FMEA does notgroup together the items causing the effects. Pas-sey noted that the placement of failure modes withthe same effect on multiple pages tends to obscurethe major failure modes that need to be addressed.

Cooper (1999) noted that a good system pro-vides quick results. However, Montgomery et al.(1996) noted that the traditional brainstormingprocess for FMEA is tedious, time-consuming, anderror-prone. Wirth, Berthold, Krämer, and Peter(1996) stated that FMEA is laborious and time-consuming to carry out. Parkinson, Thompson,and Iwnicki (1998) stated that FMEA is very time-consuming and must be focused to avoid masses ofpaperwork and workforce hostility. Tsung (1995)noted that FMEA often suffers from duplication ofeffort and large amounts of redundant documen-tation. The information overload from repetitive,redundant, and scattered data obscures the rela-tionships among the rows and columns of theFMEA, adding to the confusion. Information over-load is the inability to extract needed knowledgefrom an immense quantity of information (Nelson,1994).

Extent of FMEA UsageFMEA has been widely adopted and has becomestandard practice in Japanese, American, and Eu-ropean manufacturing companies (Huang et.al.,1999). Goddard (2000) stated that FMEA is a tra-ditional reliability and safety analysis techniquethat has enjoyed extensive application to diverseproducts over several decades. Onodera (1997) in-vestigated about 100 FMEA applications in vari-ous industries in Japan. He found that the FMEAis being used in the areas of electronics, automo-biles, consumer products, electrical generatingpower plants, building and road construction, tele-communications, etc. Stamatis (1995) pointed outthat FMEA is used in the electromechanical, semi-conductor and medical device industries and forcomputer hardware and software. Hoffman (2000)noted that FMEA is a mainline reliability andmaintainability tool. The three big United Statesautomakers request that their suppliers useFMEA (Chrysler Corporation, Ford Motor Compa-ny, & General Motors Corporation, 1995).

Perkins (1996) stated FMEA applications inthe aerospace and nuclear industries have seen an

exponential increase in product software contentand complexity. Stålhane and Wedde (1998) notedthat several companies that develop embeddedsystems have been using FMEA for years. God-dard (2000) pointed out that application of FMEAto software has been somewhat problematic and isless common than system and hardware FMEA.

The FMEA in Information SystemsGoddard (2000) stated that there are two types ofsoftware FMEA for embedded control systems:system software FMEA and detailed softwareFMEA. System software FMEA can be used toevaluate the effectiveness of the software architec-ture without all the work required for detailedsoftware FMEA. Goddard noted that system soft-ware FMEA analysis should be performed as earlyas possible in the software design process. ThisFMEA analysis is based on the top-level softwaredesign. Goddard stated that the system softwareFMEA should be documented in the tabular for-mat used for hardware FMEA.

Goddard (2000) stated that detailed softwareFMEA validates that the software has been con-structed to achieve the specified safety require-ments. Detailed software FMEA is similar to com-ponent level hardware FMEA. Goddard noted thatthe analysis is lengthy and labor intensive. Hepointed out that the results are not available untillate in the development process. Goddard arguedthat detailed software FMEA is often cost effectiveonly for systems with limited hardware integrity.

Banerjee (1995) provided an insightful look athow teams should use FMEA in software develop-ment. Banerjee presented lessons learned in usingFMEA at Isardata, a small German software com-pany. FMEA requires teamwork and the pooledknowledge of all team members. Many potentialfailure modes are common to a class of softwareprojects. Banerjee also pointed out that the corre-sponding recommended actions are also common.Good learning mechanisms in a project team or inan organization greatly increase the effectivenessof FMEA. FMEA can improve software quality byidentifying potential failure modes. Banerjee stat-ed that FMEA can improve productivity throughits prioritization of recommended actions.

Pries (1998) pointed out that a software FMEAcan be used as a listing of potential test cases. TheFMEA failure modes can be converted into test

S T U K - Y TO - T R 1 9 0

17

cases by developing procedures for stimulating theconditions that can lead to the failure. Pries ar-gued that because each test case is bound to aparticular failure mode, these test cases can be-come a particularly aggressive collection of testprocedures.

Lutz and Woodhouse (1996) described their useof software FMEA in requirements analysis at theJet Propulsion Laboratory. Software FMEA helpedthem with the early understanding of require-ments, communication, and error removal. Lutzand Woodhouse noted that software FMEA is atime-consuming, tedious, manual task. SoftwareFMEA depends on the domain knowledge of theanalyst. Similarly, Fenelon and McDermid (1993)and Pfleeger (1998) pointed out that FMEA ishighly labor intensive and relies on the experienceof the analysts. Lutz and Woodhouse stated that acomplete list of software failure modes cannot bedeveloped.

Goddard (1993) described the use of softwareFMEA at Hughes Aircraft. Goddard noted thatperforming the software FMEA as early as possi-ble allows early identification of potential failuremodes. He pointed out that a static technique likeFMEA cannot fully assess the dynamics of controlloops. Goddard (1996) reported that a combinationof Petri nets and FMEA improved the softwarerequirements analysis process at Hughes Aircraft.

Moriguti (1997) provided a thorough examina-tion of Total Quality Management for softwaredevelopment. Moriguti presented an overview ofFMEA. The book pointed out that FMEA is abottom-up analysis technique for discovering im-perfections and hidden design defects. Morigutisuggested performing the FMEA on subsystem-level functional blocks. Moriguti noted that whenFMEA is performed on an entire product, theeffort often quite large. Moriguti pointed out thatusing FMEA before the fundamental design iscompleted can prevent extensive rework. Morigutiexplained that when prioritization is emphasizedin the FMEA, the model is sometimes referred toas Failure Modes, Effects and Criticality Analysis(FMECA).

Pries (1998) outlined a procedure for usingsoftware design FMEA. Pries stated that softwaredesign FMEA should start with system or subsys-tem outputs listed in the Item and Function (left-most) columns of the FMEA. The next steps are to

list potential failure modes, effects of failures, andpotential causes. Pries noted that current designcontrols can include design reviews, walkthroughs,inspections, complexity analysis, and coding stand-ards. Pries argued that because reliable empiricalnumbers for occurrence values are difficult orimpossible to establish, FMEA teams can set alloccurrences to a value of 5 or 10. Pries noted thatdetection numbers are highly subjective and heav-ily dependent on the experience of the FMEAteam.

Luke (1995) discussed the use of FMEA forsoftware. He pointed out that early identificationof potential failure modes is an excellent practicein software development because it helps in thedesign of tests to check for the presence of thefailure modes. In FMEA, a software failure mayhave effects in the current module, in higher levelmodules, and the system as a whole. Luke suggest-ed that a proxy such as historical failure rate besubstituted for occurrence.

Stamatis (1995) presented the use of FMEAwith information systems. He noted that computerindustry failures may result from software devel-opment process problems, coding, systems analy-sis, systems integration, software errors, and typ-ing errors. Stamatis pointed out that failures mayarise from the work of testers, developers, andmanagers. Stamatis noted that a detailed FMEAanalysis may examine the source code for errors inlogic and loops, parameters and linkage, declara-tions and initializations, and syntax.

Ammar, Nikzadeh, and Dugan (1997) used se-verity measures with FMEA for a risk assessmentof a large-scale spacecraft software system. Theynoted that severity considers the worst potentialconsequence of a failure whether degree of inju-ries or system damages. Ammar, Nikzadeh, andDugan used four severity classifications. Cata-strophic failures are those that may cause deathor system loss. Critical failures are failures thatmay cause severe injury or major system damagethat results in mission loss. Marginal failures arefailures that may cause minor injury or minorsystem damage that results in delay or loss ofavailability or mission degradation. Minor failuresare not serious enough to cause injuries or systemdamage but result in unscheduled maintenance orrepair.

Maier (1997) described the use of FMEA during

18

S T U K - Y TO - T R 1 9 0

the development of robot control system softwarefor a fusion reactor. He used FMEA to examineeach software requirement for all possible failuremodes. Failure modes included an unsent mes-sage, a message sent too early, a message sent toolate, a wrong message, and a faulty message.FMEA causes included software failures, designerrors, and unforeseen external events. Maier not-ed that for software failures, additional protectivefunctions to be integrated in the software mayneed to be defined. For design errors, the errorsmay need to be removed or the design may need tobe modified. Maier stated that unforeseen exter-nal events may be eliminated by protective meas-ures or changing the design. Maier recommendedthat the methodology he presented be applied atan early stage of the software development proc-ess to focus development and testing efforts.

Bouti, Kadi, and Lefrançois (1998) describedthe use of FMEA in an automated manufacturingcell. They noted that a good functional descriptionof the system is necessary for FMEA. They recom-mended the use of an overall model that clearlyspecifies the system functions. They suggested theuse of system modeling techniques that facilitatecommunication and teamwork. Bouti, Kadi, andLefrançois argued that it is impossible to performa failure analysis when functions are not welldefined and understood. They pointed out thatfailure analysis is possible during the design phasebecause the functions are well established by then.Bouti, Kadi, and Lefrançois also noted that whenseveral functions are performed by the same com-ponent, possible failures for all functions shouldbe considered.

Becker and Flick (1996) applied FMEA in Lock-heed Martin’s development of a distributed systemfor air traffic control. They described the failuremodes and detection methods used in their FMEA.The classes of failure modes for their applicationincluded hardware or software stop, hardware orsoftware crash, hardware or software hang, slowresponse, startup failure, faulty message, check-point file failure, internal capacity exceeded, andloss of service. Becker and Flick listed severaldetection methods. A task heartbeat monitor iscoordination software that detects a missed func-tion task heartbeat. A message sequence managerchecks message sequence numbers to flag messag-es that are not in order. A roll call method takes

attendance to ensure that all members of a groupare present. A duplicate message check looks forthe receipt of duplicate messages.

Stålhane and Wedde (1998) used FMEA with atraffic control system in Norway. They used FMEAto analyse changes to the system. They noted thatpotentially any change involving an assignment ora procedure call can change system parameters ina way that could compromise the system’s safety.The FMEA pointed out code segments or proce-dures requiring further investigation. Stålhaneand Wedde also stated that for an FMEA of codemodifications, implementation and programminglanguage knowledge is very important.

FMEA SizeThe sheer size of the FMEA presents a challenge.An FMEA worrksheet is at least 16 columns wide,is often several levels deep, and is sometimes doz-ens or hundreds of pages long. Montgomery et al.(1996) noted that entries in the FMEA worksheetare voluminous. Bull et al. (1996) pointed out thata manual FMEA is a bulky document. Parkinson,Thompson, and Iwnicki (1998) stated that FMEAmust be focused to avoid masses of paperwork.Tsung (1995) noted that FMEA often suffers fromlarge amounts of documentation. Large documentsimplemented in Excel require much scrolling andscrolling adversely affects ease of use (Plaisant et.al., 1997).

FMEA BasicsAn FMEA worksheet has multiple rows with fixedcolumn headings. The FMEA worksheet is oftenarranged in 16 columns, 11 for the FMEA Processand 5 for the Action Results (McDermott, Mikulak,& Beauregard, 1996). Passey (1999) pointed outthat no universal FMEA format has emerged. TheFMEA Process columns normally include Item andFunction; Potential Failure Mode; PotentialEffect(s) of Failure; Severity; Potential Cause(s) ofFailure; Occurrence; Current Controls; Detection;Risk Priority Number; Recommended Action; andResponsibility and Target Completion Date. TheAction Results columns include Action Taken, Se-verity, Occurrence, Detection, and Risk PriorityNumber. Kukkal, Bowles, and Bonnell (1993) stat-ed that failure effects at one level of a system needto be propagated as failure modes to the next high-er level.

S T U K - Y TO - T R 1 9 0

19

Breyfogle (1999) described the road map for creat-ing FMEA entries as follows. First, note an inputor function of a process or design. Then, list two orthree ways the input or function can go wrong.Next, list one or more effects of failure. For eachfailure mode, list one or more causes of input go-ing wrong. Then, for each cause, list at least onemethod of preventing or detecting the cause. Last,enter the severity, occurrence, and detection val-ues.

Stamatis (1995) wrote the most comprehensivereference available for FMEA (Schneider, 1996).Stamatis stated that there are four types ofFMEA: the system FMEA, the design FMEA, theprocess FMEA, and the service FMEA. All types ofFMEA produce a list of failure modes ranked bythe risk priority numbers and a list of recommend-ed actions to reduce failures or to improve failuredetection.

The system FMEA is used to analyse systemsand subsystems in the early concept and designphase (Stamatis, 1995). A system FMEA analysespotential failure modes between the functions ofthe system caused by system deficiencies. ThisFMEA includes the interactions between systemsand the elements of the system. A significantoutput of the system FMEA is a list of potentialsystem functions that could detect potential fail-ure modes.

The design FMEA is used to analyse productsbefore they are released to manufacturing (Stama-tis, 1995). A design FMEA analyses failure modescaused by deficiencies in the design of a system.Significant outputs of the design FMEA are a listof critical and/or significant characteristics of thedesign; a list of potential parameters for testing,inspection, and/or detection methods; and a list ofpotential recommended actions for the critical andsignificant characteristics of the design.

The process FMEA is used to analyse manufac-turing and assembly processes (Stamatis, 1995). Aprocess FMEA analyses failure modes caused byprocess or assembly deficiencies. Significant out-puts of the process FMEA are a list of critical and/or significant characteristics of the process and alist of potential recommended actions to address

the critical and significant characteristics.The service FMEA is used to analyse services

before they reach the customer (Stamatis, 1995). Aservice FMEA analyses failure modes such astasks, errors, and mistakes caused by system orprocess deficiencies. Significant outputs of theservice FMEA are a list of critical or significanttasks or processes, a list of potential bottleneckprocesses or tasks, and a list of potential functionsfor monitoring the system or process.

Hoffman (2000) noted that today’s commercial-ly available FMEA tools are nothing more thandocumentation aids. Russomanno, Bonnell, andBowles (1994) and Montgomery et al. (1996) point-ed out that these packages do not include anyintelligence but merely assist with clerical func-tions, data collection, database manipulations, andautomatic report generation.

Ristord et. al. (2001) describe the application ofFMEA procedure on the new nuclear instrumen-tation system (NIS) at Tihange nuclear powerplant in Belgium. The NIS-system is implementedon a generic software-based safety system plat-form SPINNLINE 3 by Schneider Electric Indus-tries. The choice of a software-based technologyhas raised the issue of the risk of CCF due to thesame software being used in redundant independ-ent units. Implementing functional diversity orequipment diversity has been considered butfound either not practicable or of little valuewithin this context. Because of the possible conse-quences in case of the NIS not performing itsprotection function on demand, the licensing au-thority has required an FMEA oriented towardthe SCCF risk as part of the safety case.

This FMEA has been performed on the NISarchitecture, the SPINLINE 3 Operational SystemSoftware and the three application software pack-ages (i.e. source, intermediate and power range).The software FMEA experience included the ad-aptation of the principles of FMEA to analyse asoftware program, the choice of a “block of instruc-tion” (BI) approach to identify the components toanalyse, the definition of the software failuremodes associated with the BIs and the feedback ofexperience.

20

S T U K - Y TO - T R 1 9 0

3.1 OverviewAs mentioned already in the previous chaptersvarious industries have established their ownFMEA standards to provide the best support forthe product and process FMEA related to a specif-ic branch of industry. Aerospace and defence com-panies have usually referred to the MIL-STD-1629A standard. Automotive suppliers have most-ly used SAE J-1739 or the Chrysler, Ford, andGeneral Motors provided FMEA methodologies.Other industries have generally adopted one ofthese or the IEC 60812 standard customising thestandards to meet the specific requirements andneeds of the industry. A short review to the stand-ards referred in the text is given in this chapter.

3.2 IEC 60812IEC 60812 published by the International Electro-technical Commission describes a failure modeand effects analysis, and a failure mode, effectsand criticality analysis. The standard gives guid-ance how the objectives of the analysis can beachieved when using FMEA or FMECA as riskanalysis tools. The following information is includ-ed in the standard:• procedural steps necessary to perform an anal-

ysis• identification of appropriate terms, assump-

tions, criticality measures, failure modes• determining basic principles• form for documenting FMEA/FMECA• criticality grid to evaluate failure effects.

3.3 MIL-STD-1629AMIL-STD-1629A is dated on November 24th, 1980,and has been published by the United States De-partment of Defence. The standard establishes re-quirements and procedures for performing a fail-ure mode, effects, and criticality analysis. In thestandard FMECA is presented to systematicallyevaluate and document the potential impacts of

each functional or hardware failure on missionsuccess, personnel and system safety, system per-formance, maintainability and maintenance re-quirements. Each potential failure is ranked bythe severity of its effect in order that appropriatecorrective actions may be taken to eliminate orcontrol the risks of potential failures. The docu-ment details the functional block diagram model-ling method, defines severity classification andcriticality numbers. The following sample formatsare provided by the standard:• failure mode and effects analysis• criticality analysis• FMECA- maintainability information• damage mode and effects analysis• failure mode, effects, and criticality analysis

plan

MIL-STD-1629A was cancelled by the action of thestandard authority on August 4th, 1998. Userswere referred to use various national and interna-tional documents for information regarding fail-ure mode, effects, and criticality analysis.

3.4 SAE J-1739The document provides guidance on the applica-tion of the failure mode and effects analysis tech-nique. The focus is on performing the product,process and plant machinery FMEA. The standardoutlines the product and process concepts for per-forming FMEA on plant machinery and equip-ment, and provides the format for documentingthe study. The following information is included inthe document:• FMEA implementation• what is an FMEA?• format for documenting product/process FMEA

on machinery• development of a product/process FMEA• suggested evaluation criteria for severity,

detection and occurrence of failure.

3 FMEA standards

S T U K - Y TO - T R 1 9 0

21

The Failure Mode and Effects Analysis procedureswere originally developed in the post World War IIera for mechanical and electrical systems andtheir production processes, before the emergencyof software based systems in the market. Commonstandards and guidelines even today only brieflyconsider the handling of the malfunctions causedby software faults and their effects in FMEA andoften state that this is possible only to a limitedextent (IEC 60812). No specific standard or guide-line concentrating on the special issues of soft-ware-based system FMEA has yet been published.

A general FMEA/FMECA procedure is de-scribed in the IEC 60812 standard. Present officialversion of this standard stems from the year 1985.IEC TC56/WG 2 is, however, working on a revisionof the standard; a Committee Draft (CD) has beendistributed to the committee members in spring2001.

The general FMEA/FMECA procedure present-ed in IEC 60812 agrees well with other majorFMEA standards and guidelines presented inChapter 3. These procedures constitute a goodstarting point also for the FMEA for software-based systems. Depending on the objectives, leveletc. of the specific FMEA this procedure can easilybe adapted to the actual needs case by case.

In this chapter the whole structure of theFMEA procedure is not presented (refer to theIEC 60812). Instead, only the special aspects con-nected to the incorporation of the software failuremodes and effects in the failure mode and effectsanalysis of a software-based automation systemapplication are considered. These considerationsare based on the findings from the (rather sparse)literature on the specific subject.

A complete FMEA for a software-based auto-mation system shall include both the hardwareand software failure modes and their effects onthe final system function. In this report we, how-ever, confine only on the software part of theanalysis, the hardware part being more straight-forward (if also difficult itself).

As a starting point for the following considera-tions the main procedural steps of a FMEA aredefined to be (adapted from IEC 60812):• Define system boundaries for analysis,• understand system requirements and function,• define failure/success criteria,• breakdown the system into elements,• determine each element’s failure modes and

their failure effects and record these,• summarise each failure effect, and• report findings.

FMEA is documented on tabular worksheet; anexample of typical FMEA worksheet (derived fromIEC 60812) is presented in App. 1; this can readilybe adapted to the specific needs of each actualFMEA application. App. 2. gives an example of aSWFMEA form.

Criticality analysis is a quantitative extensionof the (qualitative) FMEA. Using the failure ef-fects identified by the FMEA, each effect is classi-fied according to the severity of damages it causesto people, property or environment. The frequencyof the effect to come about together with itsseverity defines the criticality. A set of severityand frequency classes are defined and the resultsof the analysis is presented in the criticality ma-trix (Fig. 3, IEC 60812). The SAE J-1739 stand-ards adds a third aspect to the criticality assess-ment by introducing the concept of Risk Priority

4 SWFMEA procedure

Figure 3. The criticality matrix.

22

S T U K - Y TO - T R 1 9 0

Number (RPN) defined as the product of threeentities, Severity, Occurrence (i.e. frequency) andDetection.

4.1 Level of analysisThe FMEA is a bottom-up method, where the sys-tem under analysis is first hierarchically dividedinto components (Fig. 4). The division shall bedone in such a way that the failure modes of thecomponents at the bottom level can be identified.The failure effects of the lower level componentsconstitute the failure modes of the upper levelcomponents.

The basic factors influencing the selection ofthe proper lowest level of system decompositionare the purpose of the analysis and the availabili-ty of system design information.

When considering the FMEA of a software-based automation system application, the utmostpurpose of the analysis usually is to find out ifthere are some software faults that in some situa-tion could jeopardise the proper functioning of thesystem. The lowest level components from which

the analysis is started are then units of softwareexecuted sequentially in some single processor orconcurrently in parallel processor of the system.

A well-established way to realise software-based safety-critical automation applications now-adays is to implement the desired functions on anautomation system platform, e.g. on a programma-ble logic system or on a more general automationsystem. The software in this kind of realisation ison the first hand divided into system software andapplication software. The system software (a sim-ple operating system) can further be divided intothe system kernel and system services. The kernelfunctions include e.g. the system boot, initialisa-tion, self-tests etc. whereas the system servicese.g. take care of different data handling opera-tions. The platform includes also a library ofstandardised software components, the functionblocks (“modules”), of which the application isconstructed by connecting (“configuring”) ade-quate function blocks to form the desired applica-tion functions, which rest on the system servicesupport (see Fig. 5). The automation system plat-form also contains a graphical design tool. Usingthis tool the designer converts the functional spec-ifications of the application to a functional blockdiagram which is then automatically translated tothe executable object code. The total design docu-mentation of the application thus consists crudelyof the requirement specification, the functionalspecification, the functional block diagrams andthe (listing) of the object code.

A natural way of thinking would then suggestthat the FMEA of a software-based applicationcould be started from the function block diagramsby taking the individual function blocks as thelowest level components in the analysis. In prac-tice this procedure, however, seems unfeasible.Firstly, this approach in most cases leads to ratherextensive and complicated analyses, and secondly,

Figure 4. The FMEA level hierarchy (adaptedfrom IEC 60812).

A

B

C

D

1 n

Module

Item

Failure

modes

Failure effects

Failuremodes

Failure

modes

Bottom level

Level 1

Higher levels

Failure

modes

Failure effects

Figure 5. Construction of the application function.

System software

Application software

System kernel

System services

Modules Application function

S T U K - Y TO - T R 1 9 0

23

the failure modes of the function blocks are notknown.

E.g. Ristord et. al. (2001) state that the soft-ware FMEA is practicable only at the (application)function level. They consider the SPINLINE 3application software to be composed of units calledBlocks of Instructions (BIs) executed sequentially.The BIs are defined by having the following prop-erties:• BIs are either “intermediate”—they are a se-

quence of smaller BIs—or “terminal”—theycannot be decomposed in smaller BIs.

• They have only one “exit” point. They produceoutput results from inputs and possibly memo-rized values. Some BIs have direct access tohardware registers.

• They have a bounded execution time (i.e. theexecution time is always smaller than a fixedvalue).

• They exchange data through memory varia-bles. A memory variable is most often writtenby only one BI and may be read by one orseveral BIs.

Their system decomposition of the SPINLINE 3software is presented in Fig. 6.

4.2 Failure modesWhen a proper way of decomposing the systemunder analysis is found the next step is to definethe failure modes of the components. For the hard-ware components this in general is straightfor-ward and can be based on operational experience

of the same and similar components. Componentmanufacturers often give failure modes and fre-quencies for their products.

For the software components such informationdoes not exists and failure modes are unknown (ifa failure mode would be known, it would becorrected). Therefore, the definition of failuremodes is one of the hardest parts of the FMEA of asoftware-based system. The analysts have to applytheir own knowledge about the software and pos-tulate the relevant failure modes.

Reifer (1979) gives the following general list offailure modes based on the analysis of three largesoftware projects.• Computational• Logic• Data I/0• Data Handling• Interface• Data Definition• Data Base• Other.

Ristord et. al. (2001) give a the following list of fivegeneral purpose failure modes at processing unitlevel:• the operating system stops• the program stops with a clear message• the program stops without clear message• the program runs, producing obviously wrong

results• the program runs, producing apparently cor-

rect but in fact wrong results.

Figure 6. First levels of the decomposition of the SPINLINE 3 software (Ristord et. al., 2001).

Digital input

Analog input

Self testing

Acquisition

Application

Initialisation

Processing

24

S T U K - Y TO - T R 1 9 0

More specifically, they define for the SPINLINE 3BIs the following list of failure modes:• the BI execution does not end through the

“exit” point• the BI execution time does not meet time limits• the BI does not perform the intended actions or

performs unintended actions:• it does not provide the expected outputs• it modifies variables that it shall not modify• it does not interact as expected with I/O

boards• it does not interact as expected with CPU

resources• it modifies code memory or constants.

Lutz et.al. (1999b) divide the failure modes con-cerning either the data or the processing of data.For each input and each output of the softwarecomponent, they consider of the following four fail-ure modes:• Missing Data (e.g., lost message, data loss due

to hardware failure)• Incorrect Data (e.g., inaccurate data, spurious

data)• Timing of Data (e.g., obsolete data, data arrives

too soon for processing)• Extra Data (e.g., data redundancy, data over-

flow).

For each event (step in processing), on the otherhand, they consider of the following four failuremodes:• Halt/Abnormal Termination (e.g., hung or dead-

locked at this point)• Omitted Event (e.g., event does not take place,

but execution continues)• Incorrect Logic (e.g., preconditions are inaccu-

rate; event does not implement intent)• Timing/Order (e.g., event occurs in wrong or-

der; event occurs too early/late).

Becker et. al. (1996) give the following classes offailure modes:• hardware or software stop• hardware or software crash• hardware or software hang• slow response• startup failure

• faulty message• checkpoint file failure• internal capacity exceeded• loss of service.

They also listed the following detection methods:• a task heartbeat monitor is coordination soft-

ware that detects a missed function task heart-beat

• a message sequence manager checks the se-quence numbers for messages to flag messagesthat are not in order

• a roll call method takes attendance to ensurethat all members of a group are present

• a duplicate message check looks for the receiptof duplicate messages.

Still further, a very comprehensive classificationof software failure modes is given by Beizer (1990).

IEC 60812 also gives guidance on the definitionof failure modes and contains two tables of exam-ples of typical failure modes. They are, however,largely rather general and/or concern mainly me-chanical system thus not giving much support forsoftware FMEA. The same goes also for othercommon standards and guidelines.

The FMEA also includes the identification anddescription of possible causes for each possiblefailure mode. Software failure modes are causedby inherent design faults in the software; there-fore when searching the causes of postulated fail-ure modes the design process should be looked at.IEC 60812 gives a table of possible failure causes,which largely are also applicable for software.

4.3 Information requirementsThe IEC 60812 standard defines rather compre-hensively the information needs for the generalFMEA procedure. It emphasises the free availabil-ity of all relevant information and the active co-operation of the designer. The main areas of infor-mation in this standard are:• system structure• system initiation, operation, control and main-

tenance• system environment• modelling• system boundary

S T U K - Y TO - T R 1 9 0

25

• definition of the system’s functional structure• representation of system structure• block diagrams• failure significance and compensating provi-

sions.

A well-documented software-based system designmostly covers these items, so it is more the ques-tion of the maturity of the design process than thespecialities of software-based system.

4.4 Criticality analysisAs stated earlier, the Criticality Analysis is aquantitative extension of the FMEA, based on theseverity of failure effects and the frequency of fail-ure occurrence, possibly augmented with the prob-ability of the failure detection. For an automationsystem application, the severity is determined bythe effects of automation function failures on thesafety of the controlled process. Even if difficult,the severity of a single low level component failuremode can in principle be concluded backwards

from the top level straightforwardly.The frequency of occurrence is much harder to

define for a software-based system. The manifes-tation of an inherent software fault as a failuredepends not only on the software itself, but also onthe operational profile of the system, i.e. on thefrequency of the triggering even that causes thefault to lead to failure. This frequency is usuallynot known. Luke (1995) proposed that a proxysuch as McCabe’s complexity value or Halstead’scomplexity measure be substituted for occurrence.Luke argued that there is really no way to know asoftware failure rate at any given point in timebecause the defects have not yet been discovered.He stated that design complexity is positivelylinearly correlated to defect rate. Therefore, Lukesuggested using McCabe’s complexity value orHalstead’s complexity measure to estimate theoccurrence of software defects.

Also the probability of detection is hard todefine, since only a part of software failures can bedetected with self-diagnostic methods.

26

S T U K - Y TO - T R 1 9 0

Failure mode and effects analysis (FMEA) is awell-established reliability engineering tool wide-ly used in various fields of industry. The purposeof FMEA is to identify possible failure modes ofthe system components, evaluate their influenceson system behaviour and propose proper counter-measures to suppress these effects. FMEA is pri-marily adapted to the study of material and equip-ment failures and can be applied to systems basedon different technologies (electrical, mechanical,hydraulic etc.) and combinations of these technolo-gies (IEC 60812).

FMEA is well understood at the systems andhardware levels, where the potential failure modesusually are known and the task is to analyse theireffects on system behaviour. Nowadays, more andmore system functions are realised on softwarelevel, which has aroused the urge to apply theFMEA methodology also on software-based sys-tems. Software failure modes generally are un-known (“software modules do not fail, they onlydisplay incorrect behaviour”) and depend on dy-namic behaviour of the application. These facts setspecial requirements on the FMEA of softwarebased systems and make it difficult to realise. Acommon opinion is that FMEA is applicable tosoftware-based systems only to a limited extent.

Anyhow, the total verification and validation(V&V) process of a software-based safety criticalapplication shall include also the software failuremode and effects analysis of the system at aproper level. FMEA can be used in all phases ofthe system life cycle from requirement specifica-tion to design, implementation, operation and

maintenance. Most benefit from the use of FMEAcan be achieved at the early phases of the design,where it can reveal weak points in the systemstructure and thus avoid expensive design chang-es.

Perhaps the greatest benefit from the FMEAapplied on a software-based system is the guid-ance it can give for other V&V efforts. By reveal-ing the possible weak points it can e.g. helpgenerating test cases for system testing.

The FMEA can not alone provide the necessaryevidence for the qualification of software-basedsafety critical applications in nuclear powerplants, but the method should be combined withother safety and reliability engineering methods.Maskuniitty et. al. (1994) propose a stepwise re-fined iterative application of fault tree analysis(FTA) and failure mode and effects analysis proce-dure as described in Fig. 7. This is similar with theBi-directional Analysis (BDA) method proposed byLutz et. al. (1997, 1999b). The preliminary faulttree analysis and determination of minimal cutsets directs the identification of failure modes tothose that are most significant for the systemreliability, the effects analysis of these failures,then steer the refinement of the fault trees andthe final detailed FMEA.

The fault trees and reliability block diagramsare basically static approaches that can not reachall the dynamical aspects of the software. Otherapproaches, e.g. Petri net analysis, Dynamic flow-graph analysis etc. have been proposed to catchthese effects in the reliability and safety analysisof software-based system (Pries, 1998).

5 Conclusions

S T U K - Y TO - T R 1 9 0

27

Figure 7. The analysis steps of combined FTA/FMEA procedure.

DESCRIPTION AND

FAMILIARIZATION

OF THE SYSTEM

DETAILED FAULT

TREE ANALYSIS

- modification of the

fault tree using the

FMEA results

- documentation

- minimal cut set search

PRELIMINARY FMEA

- identification of the

failure modes corres-

ponding to the fault

tree basic events in

the shortest minimal

cut sets

PRELIMINARY FAULT

TREE ANALYSIS

- fault tree construction

- minimal cut set search

DETAILED FMEA

- more detailed software

FMEA

- documentation

28

S T U K - Y TO - T R 1 9 0

Addy, E.A., 1991, Case Study on Isolation of Safe-ty-Critical Software. In Proceedings of the 6thAnnual Conference on Computer Assurance,NIST/IEEE, Gaithersburg, MD, pp. 75–83.

Alur, R., Henzinger, T.A. & Ho, P.H., 1996, Auto-matic Symbolic Verification of Embedded Systems.In IEEE Transactions on Software Engineering22, 3, pp. 181–201.

Ammar, H.H., Nikzadeh, T. & Dugan, J.B., 1997, Amethodology for risk assessment of functionalspecification of software systems using coloredPetri nets. Proceedings, Fourth International Soft-ware Metrics Symposium, pp. 108–117.

Atlee, J.M. & Gannon, J., 1993, State-Based ModelChecking of Event-Driven System Requirements.IEEE Transactions on Software Engineering 19, 1,pp. 24–40.

Banerjee, N., 1995, Utilization of FMEA concept insoftware lifecycle management. Proceedings ofConference on Software Quality Management, pp.219–230.

Barkai, J., 1999, Automatic generation of a diag-nostic expert system from failure mode and effectsanalysis (FMEA) information. Society of Automo-tive Engineers, 1449, pp. 73–78.

Becker, J.C. & Flick, G., 1996. A practical approachto failure mode, effects and criticality analysis(FMECA) for computing systems. High–AssuranceSystems Engineering Workshop, pp. 228–236.

Beizer, B., 1990, Software Testing Techniques. JohnWiley & Sons

Bell, D., Cox, L., Jackson, S. & Schaefer, P., 1992,Using Causal Reasoning for Automated FailureModes & Effects Analysis (FMEA). IEEE ProcAnnual Reliability and Maintainability Symposi-um, pp. 343–353.

Ben–Daya, M. & Raouf, A., 1996, A revised failuremode and effects analysis model. InternationalJournal of Quality & Reliability Management,13(1), pp. 43–47.

Bestavros, A.A., Clark, J.J. & Ferrier, N.J., 1990,Management of sensori-motor activity in mobilerobots. In Proceedings of the 1990 IEEE Interna-tional Conference on Robotics and Automation,IEEE Computer Society Press, Cincinnati, OH, pp.592–597.

Binder, R.V., 1997, Can a manufacturing qualitymodel work for software? IEEE Software, 14(5),pp. 101–105.

Bluvband, Z. & Zilberberg, E., 1998, Knowledgebase approach to integrated FMEA. ASQ’s 52nd

Annual Quality Congress Proceedings, pp. 535–545.

Bondavalli, A. & Simoncini, L., 1990, Failure clas-sification with respect to detection. in First YearReport, Task B: Specification and Design for De-pendability, Volume 2. ESPRIT BRA Project 3092:Predictably Dependable Computing Systems, May1990.

Bouti, A., Kadi, D.A. & Lefrançois, P., 1998, Anintegrative functional approach for automatedmanufacturing systems modeling. Integrated Com-puter-Aided Engineering, 5(4), pp. 333–348.

Bibliography

Bibliography S T U K - Y TO - T R 1 9 0

29

Bowles, J.B., 1998, The new SAE FMECA stand-ard. 1998 Proceedings Annual Reliability andMaintainability Symposium, pp. 48–53.

Burns D.J. & Pitblado R.M., 1993, A modifiedHAZOP methodology for Safety Critical SystemAssessment. 7th meeting of the UK Safety CriticalClub, Bristol, February 1993.

Burns, D. & McDermid, J.A., 1994, Real-time safe-ty-critical systems: analysis and synthesis. Soft-ware Engineering Journal, vol. 9, no. 6, pp. 267–281, Nov. 1994.

Cha, S.S., Leveson, N.G. & Shimeall, T.J., 1998,Safety Verification in Murphy Using Fault TreeAnalysis. Proc. 10th Int. Conf. on S/W Eng., Apr1988, Singapore, pp. 377–386.

Cha, S.S., Leveson, N.G. & Shimeall, T.J., 1991,Safety Verification of Ada Programs Using FaultTree Analysis. In IEEE Software, 8, 4, pp. 48–59.

Chillarege, R., Bhandari, I., Chaar, J., Halliday, M.,Moebus, D., Ray, B. & Wong, M.Y., 1992, Orthogo-nal Defect Classification—A Concept for In-Proc-ess Measurements. IEEE Trans-actions on Soft-ware Engineering 18, 11, pp. 943–956.

Cichocki, T. & Gorski, J., 2001, Formal support forfault modelling and analysis. SAFECOMP 2001Budapest, Hungary, 26–28 September 2001.

Coutinho, J.S., 1964, Failure-Effect Analysis.Trans. New York Academy of Sciences, pp.564–585.

Crow, J. & Di Vito, B.L., 1996, Formalizing SpaceShuttle Software Requirements. In Proceedings ofthe ACM SIGSOFT Workshop on Formal Methodsin Software Practice, San Diego, CA.

DeLemos, R., Saeed, A. & Anderson, T., 1995,Analyzing Safety Requirements for Process Con-trol Systems. IEEE Software, Vol. 12, No. 3, May,1995, pp. 42–52.

Dhillon, B. S., 1992, Failure modes and effectsanalysis-bibliography. Microelectronics Reliabi-ity,32(5), pp. 719–731.

Dugan, J.B. & Boyd, M.A., 1993, Fault Trees &Markov Models for Reliability Analysis of Fault-Tolerant Digital Systems. Reliability Engineering& System Safety 39 (1993), pp. 291–307.

Ezhilchelvan, P.D. & Shrivastava, S.K., 1989, Aclassification of faults in systems. University ofNewcastle upon Tyne.

Fencott, C. & Hebbron, B., 1995, The Application ofHAZOP Studies to Integrated Requirements Mod-els for Control Systems. ISA Transactions 34, pp.297–308.

Fenelon P., McDermid J.A., Nicholson M. & Pum-frey D.J,, 1994, Towards Integrated Safety Analy-sis and Design. ACM Applied Computing Review,2(1), pp. 21–32, 1994.

Fenelon, P. & Hebbron, B.D., 1994, Applying HAZ-OP to software engineering models. In Risk Man-agement And Critical Protective Systems: Pro-ceedings of SARSS 1994, Altrincham, England,Oct. 1994, pp. 1/1–1/16, The Safety And ReliabilitySociety.

Fenelon, P. & McDermid, J.A., 1993, An integratedtool set for software safety analysis. The Journal ofSystems and Software, 21, pp. 279–290.

Fenelon, P. & McDermid, J.A., 1992, Integratedtechniques for software safety analysis. In Proceed-ings of the IEE Colloquium on Hazard Analysis.Nov. 1992, Institution of Electrical Engineers.

Fragola, J.R. & Spahn, J.F., 1973, The SoftwareError Effects Analyis; A Qualitative Design Tool.In Proceedings of the 1973 IEEE Symposium onComputer Software Reliability, IEEE, New York,pp. 90–93.

Gilchrist, W., 1993, Modelling failure modes andeffects analysis. International Journal of Qualityand Reliability Management, 10(5), pp. 16–23.

Goble, W.M. & Brombacher, A.C., 1999, Using afailure modes, effects and diagnostic analysis(FMEDA) to measure diagnostic coverage in pro-grammable electronic systems. Reliability Engi-neering & System Safety. 66(2), pp. 145–148, 1999Nov.

30

S T U K - Y TO - T R 1 9 0 Bibliography

Goddard, P.L., 2000, Software FMEA techni-ques.2000 Proceedings Annual Reliability and Main-tainability Symposium, pp. 118–123.

Goddard, P.L., 1996, A combined analysis approachto assessing requirements for safety critical real-time control systems. 1996 Proceedings AnnualReliability and Maintain-ability Symposium, pp.110–115.

Goddard, P.L., 1993, Validating the safety of em-bedded real-time control systems using FMEA.1993 Proceedings Annual Reliability and Main-tainability Symposium, pp. 227–230.

Hall, F.M., Paul, R.A. & Snow, W.E., 1983, Hard-ware/Software FMECA. Proc Annual Reliabilityand Maintainability Symposium, pp. 320–327.

Hansen, K.M., Ravn, A.P. & Stavridou, V., 1993,From Safety Analysis to Software Requirements.IEEE Transactions on Software Engineering, Vol.21, July, 1993, pp. 279–290.

Hebbron, B.D. & Fencott, P.C, 1994, The applica-tion of HAZOP studies to integrated requirementsmodels for control systems. In Proceedings of SA-FECOMP ’94, Oct. 1994.

Heimdahl, M.P.E. & Leveson, N.G., 1996, Com-pleteness and Consistency in HierarchicalState-Based Requirements. IEEE Transactions onSoftware Engineering 22, 6, pp. 363–377.

Heitmeyer, C., Bull, B., Gasarch, C. & Labaw, B.,1995, SCR: A Toolset for Specifying and AnalyzingRequirements. In Proceedings of the 10th AnnualConference on Computer Assurance, IEEE, Gaith-ersburg, MD, pp. 109–122.

Herrman, D.S., 1995, A methodology for evaluat-ing, comparing, and selecting software safety andreliability standards. Proc. Tenth Annual Confer-ence on Computer Assurance, pp. 223–232.

Hu, A.J., Dill, D.L., Drexler, A.J. & C. Han Yang,1993, Higher-Level Specification and Verificationwith BDDs. In Proceedings of Computer AidedVerification: Fourth Internatio-nal Workshop, Lec-ture Notes in Computer Science, Eds. G. v. Boch-mann and D.K. Probst, vol. 663, Springer-Verlag,Berlin.

Ippolito, L.M. & Wallace, D.R., 1995, A Study onHazard Analysis in High Integrity SoftwareStandards and Guidelines. Gaithersburg, MD: U.S.Dept. of Commerce, National Institute of Stand-ards and Technology, NISTIR 5589.

Johnson, S.K., 1998, Combining QFD and FMEAto optimize performance. ASQ’s 52nd Annual Quali-ty Congress Proceedings, pp. 564–575.

Jordan, W.E., 1972, Failure modes, effects, andcriticality analysis. Proceedings Annual Relia-ilityand Maintainability Symposium, pp. 30–37.

Kennedy, M., 1998, Failure modes & effects analy-sis (FMEA) of flip chip devices attached to printedwiring boards (PWB). 1998 IEEE/PMT Interna-tional Electronics Manufacturing Technology Sym-posium, pp. 232–239.

Knight, J. & Nakano, G., 1997, Software TestTechniques for System Fault-Tree Analysis.

Kukkal, P., Bowles, J.B. & Bonnell, R.D., 1993,Database design for failure modes and effectsanalysis. 1993 Proceedings Annual Reliability andMaintainability Symposium, pp. 231–239.

Lamport, L. & Lynch, N., 1990, Distributed Com-puting Models and Methods. In Handbook of Theo-retical Computer Science, vol. B, Formal Modelsand Semantics, J. van Leeuwen, Ed. Cambridge/Amsterdam: MIT Press/Elsevier, 1990, pp. 1157–1199.

Leveson, N.G., 1995, Safeware, System Safety andComputers. Addison-Wesley,


31

Leveson, N.G., Cha, S.S. & Shimeall, T.J, 1991,Safety Verification of Ada Programs using soft-ware fault trees. IEEE Software, 8(7), pp. 48–59,July 1991.

Leveson, N.G. & Stolzy, J.L., 1987, Safety AnalysisUsing Petri Nets. IEEE Transactions on SoftwareEngineering, vol. 13, no. 3, March 1987.

Leveson, N.G. & Harvey, P.R., 1983, Software faulttree analysis. Journal of Systems and Software,vol. 3, pp. 173–181.

Loftus, C., Pugh, D. & Pyle, I., 1996, SoftwareFailure Modes, Effects and Criticality Analysis-Position Paper.

Luke, S.R., 1995, Failure mode, effects and critical-ity analysis (FMECA) for software. 5th Fleet Main-tenance Symposium, Virginia Beach, VA (USA),24–25 Oct 1995, pp. 731–735.

Lutz, R.R. & Shaw, H–Y., 1999a, Applying Adap-tive Safety Analysis Techniques. Proceedings of theTenth International Symposium on Software Reli-ability Engineering, Boca Raton, FL, Nov 1–4,1999.

Lutz, R.R. & Woodhouse, R.M., 1999b, Bi-direc-tional Analysis for Certification of Safety-CriticalSoftware. Proceedings, ISACC’99, InternationalSoftware Assurance Certification Conference,Chantilly, VA, Feb. 28–Mar 2, 1999.

Lutz, R.R. & Woodhouse, R.M., 1998a, FailureModes and Effects Analysis. Encyclopedia of Elec-trical and Electronics Engineering, ed. J. Webster,John Wiley and Sons Publishers.

Lutz, R.R. & Shaw, H-Y., 1998b, Applying Integrat-ed Safety Analysis Techniques (Software FMEAand FTA). JPL D–16168, Nov. 30, 1998.

Lutz, R.R., Helmer, G., Moseman, M., Statezni, D.& Tockey, S., 1998c, Safety Analysis of Require-ments for a Product Family. Proc. Third IEEEInternational Conference on Requirements Engi-neering.

Lutz, R.R. & Woodhouse, R.M., 1997,Require-ments Analysis Using Forward and Back-ward Search. Annals of Software Engineering , 3,pp. 459–475.

Lutz, R.R & Woodhouse, R.M., 1996, ExperienceReport: Contributions of SFMEA to RequirementsAnalysis. Proceedings of ICRE ’96, April 15–18,1996, Colorado Springs, CO, pp. 44–51.

Lutz, R.R., 1996a, Targeting safety-related errorsduring software requirements analysis. Journal ofSystems and Software 34, pp. 223–230.

Lutz, R.R., 1996b, Analyzing Software Require-ments Errors in Safety-Critical, Embedded Sys-tems. The Journal of Systems and Software.

Lutz, R. & Ampo, Y., 1994, Experience Report:Using Formal Methods for Requirements Analysisof Critical Spacecraft Software. In Proceedings forthe 19th Annual Software Engineering Workshop,NASA Goddard Space Flight Center, Greenbelt,MD, pp. 231–236.

Maier, T., 1997, FMEA and FTA to support safedesign of embedded software in safety–critical sys-tems. Safety and Reliability of Software BasedSystems. Twelfth Annual CSR Workshop, pp. 351–367.

Maskuniitty, M. & Pulkkinen, U., 1994, Fault treeand failure mode and effects analysis of a digitalsafety function. AVV(94)TR2, Technical ResearchCentre of Finland, 35 p.+app.

Mazur, M.F., 1994, Software FMECA. Proc. 5thInternational Symposium on Software ReliabilityEngineering, Monterey, CA, Nov. 6–9.

McDermid, J.A., Nicholson, M., Pumfrey, D.J. &Fenelon, P., 1995, Experience with the applicationof HAZOP to computer-based systems. In Proceed-ings of the 10th Annual Conference on ComputerAssurance, IEEE, Gaithersburg, MD, pp. 37–48.

32


McDermid, J.A. & Pumfrey, D.J., 1994, A develop-ment of hazard analysis to aid software design. InCOMPASS ’94: Proceedings of the Ninth AnnualConference on Computer Assurance. IEEE / NIST,Gaithersburg, MD, June 1994, pp. 17–25.

McDermott, R.E., Mikulak, R.J. & Beauregard,M.R., 1996, The basics of FMEA. New York: Quali-ty Resources.

Modugno, F., Leveson, N.G., Reese, J.D., Partridge,K. & Sandys, S.D., 1997, Integrated Safety Analy-sis of Requirements Specifications. Proc ThirdIEEE International Symposium on RequirementsEngineering, pp. 148–159.

Montgomery, T.A. & Marko, K.A., 1997, Quantita-tive FMEA automation. 1997 Proceedings AnnualReliability and Maintainability Symposium, pp.226–228.

Montgomery, T.A., Pugh, D.R., Leedham, S.T. &Twitchett, S.R., 1996, FMEA Automation for theComplete Design Process. IEEE Proc Annual Reli-ability and Maintainability Symposium, Annapo-lis, MD, Jan 6–10, pp. 30–36.

Moriguti, S., 1997, Software excellence: A totalquality management guide. Portland, OR: Produc-tivity Press.

Mullery, G.P., 1987, CORE—a method for COntrol-led Requirements Expression. In System and Soft-ware Requirements Engineering, R.H. Thayer andM. Dorfman, Eds., pp. 304–313. IEEE Press.

Nakajo, T. & Kume, K., 1991, A Case HistoryAnalysis of Software Error Cause-Effect Relation-ship. IEEE Transactions on Software Engineering,17, 8, 830–838.

Onodera, K., 1997, Effective techniques of FMEAat each life-cycle stage. 1997 Proceedings AnnualReliability and Maintaina-bility Symposium, pp.50–56.

Ostrand, T.J. & Weyuker, E.J., 1984, Collecting andCategorizing Software Error Data in an IndustrialEnvironment. The Journal of Systems and Soft-ware, 4, pp. 289–300.

Parkinson, H.J., Thompson, G. & Iwnicki, S. 1998,The development of an FMEA methodology forrolling stock remanufacture and refurbishment.ImechE Seminar Publication 1998-20, pp. 55–66.

Parzinger, M.J. & Nath, R., 1997, The effects ofTQM implementation factors on software quality.Proceedings of the Annual Meeting Decision Sci-ences Institute, 2, pp. 834–836.

Passey, R.D.C., 1999, Foresight begins with FMEA.Medical Device Technology, 10(2), pp. 88–92.

Perkins, D., 1996, FMEA for a REaL (resourcelimited) design community. American Quality Con-gress Proceedings, American Society for QualityControl, pp. 435–442.

Pfleeger, S.L., 1998, Software engineering: Theoryand practice. Upper Saddle River, NJ: PrenticeHall.

Price, C.J. & Taylor, N.S., 1998, FMEA for multiplefailures. 1998 Proceedings Annual Reliability andMaintainability Symposium, pp. 43–47.

Price, C.J., 1996, Effortless incremental designFMEA. 1996 Proceedings Annual Reliability andMaintainability Symposium, pp. 43–47.

Price, C.J., Pugh, D.R., Wilson, M.S. & Snooke, N.,1995. The Flame system: Automating electricalfailure mode & effects analysis (FMEA). 1995Proceedings Annual Reliability and Maintainabili-ty Symposium, pp. 90–95.

Pries, K.H., 1998, Failure mode & effects analysisin software development. (SAE Technical PaperSeries No. 982816). Warren-dale, PA: Society ofAutomotive Engineers.


33

Pugh, D.R. & Snooke, N., 1996, Dynamic Analysisof Qualitative Circuits for Failure Mode and Ef-fects Analysis. Proc Annual Reliability and Main-tainability Symposium, pp. 37–42.

Raheja, J., 1991, Assurance Technologies: Princi-ples and Practices. McGraw-Hill.

Raheja, D., 1990, Software system failure modeand effects analysis (SSFMEA)–a tool for reliabili-ty growth. Proceedings of the Int’l Symp. OnReliability and Maintainability (ISRM’90), Tokyo,Japan (1990), pp. 271–277.

Ratan, V., Partridge, K., Reese, J. & Leveson, N.G.,1996, Safety analysis tools for requirements speci-fications. Proc Eleventh Annual Conference onComputer Assurance, pp. 149–160.

Reese, J.D., 1996, Software Deviation Analysis,Ph.D. Thesis, University of California, Irvine,1996.

Reifer, D.J., 1979, Software Failure Modes andEffects Analysis. IEEE Transactions on Reliability,R-28, 3, pp. 247–249.

Ristord, L. & Esmenjaud, C., 2001, FMEA Per-oredon the SPINLINE3 Operational System Softwareas part of the TIHANGE 1 NIS RefurbishmentSafety Case. CNRA/CNSI Workshop 2001–Licens-ing and Operating Experience of Computer BasedI&C Systems. Ceské Budejovice–September 25–27, 2001.

Roberts, N.H., Vesely, W.E., Haasl, D.F. & Gold-berg, F.F., 1981, Fault Tree Handbook. Systemsand Reliability Research Office of U.S. NuclearRegulatory Commission, Jan. 1981.

Roland, H.E. & Moriarty, B., 1990, System SafetyEngineering and Management. New York, NewYork: John Wiley and Sons.

Rushby, J., 1993. Formal Methods and DigitalSystems Validation for Airborne Systems, SRI-CSL-93-07.

Russomanno, D.J., Bonnell, R.D. & Bowles, J.B.,1994, Viewing computer-aided failure modes andeffects analysis from an artificial intelligence per-spective. Integrated Computer-Aided Engineering,1(3), pp. 209–228.

Saeed, A., de Lemos, R. & Anderson, T., 1995, Onthe safety analysis of requirements specificationsfor safety-critical software. ISA Transactions,34(3), pp. 283–295.

Selby, R.W. & Basili, V.R., 1991, AnalyzingError-Prone System Structure. IEEE Trans-actionson Software Engineering, 17, 2, 141–152.

Signor, M.C., 2000, The Failure Analysis Matrix: AUsable Model for Ranking Solutions to Failures inInformation Systems. A Formal Dissertation Pro-posal, School of Computer and Information Scienc-es, Nova Southeastern University.

Sommerville, I., 1996, Software Engineering. FifthEdition, Addison-Wesley, Reading, MA.

Stamatis, D. H., 1995, Failure mode and effectanalysis: FMEA from theory to execution. Milwau-kee, WI: ASQC Quality Press.

Stephenson, J., 1991. System Safety 2000: A practi-cal guide for planning, managing and conductingsystem safety programs. New York, New York: VanNostrand Reinhold.

Storey, N., 1996, Safety-Critical Computer Sys-tems. Addison-Wesley.

Stålhane, T. & Wedde, K.J., 1998, Modification ofsafety critical systems: An assessment of threeapproached. Microprocessors and Microsystems,21(10), pp. 611–619.

Talbert, N., 1998, The Cost of COTS: An Interviewwith John McDermid, Computer, 31, 6, June, pp.46–52.

Tanenbaum, A.S., 1992, Modern Operating Sys-tems, Prentice-Hall, Englewood Cliffs, NJ.

34


Taylor, J.R., 1982, Fault Tree and cause Conse-quence Analysis for Control Software Validation.Technical report, Risø National Laboratory, Den-mark, January 1982.

Tsung, P. W., 1995, The three “F”s in quality andreliability: Fault tree analysis, FMECA, FRACAS.International Symposium on Quality in Non–Fer-rous Pyrometallurgy, pp. 115–124.

Vesley, W.E., Goldberg, F.F., Roberts, N.H. & Haasl,D.L., 1981, Fault Tree Handbook. NUREG-0942,U.S. Nuclear Regulatory Commission.

Voas, J., 1998, Recipe for Certifying High Assur-ance Software, RST Corp.

Wallace, D.R., Ippolito, L.M. & Kuhn, D.R., 1992,High Integrity Software Standards and Guide-lines, Gaithersburg, MD: U.S. Department of Com-merce, National Institute of Standards and Tech-nology, NIST Special Publication 500–204, Sep-tember.

Wilson, S.P., Kelly, T.P. & McDermid, J.A., 1995,Safety Case Development: Current Practice, FutureProspects. In R. Shaw, editor, Safety and Reliabili-ty of Software Based Systems, Twelfth AnnualCSR Workshop, Bruges, Belgium, September 1995.ENCRESS, Springer.

Wunram, J., 1990, A Strategy for Identificationand Development of Safety Critical Software Em-bedded in Complex Space Systems. IAA 90–557,pp. 35–51.

Yang, K. & Kapur, K.C., 1997, Customer drivenreliability: Integration of QFD and robust design.1997 Proceedings Annual Reliability and Main-tainability Symposium, pp. 339–345.

S T U K - Y TO - T R 1 9 0

35

Standards & guidelines

Chrysler Corporation, Ford Motor Company &General Motors Corporation, 1995, Potential fail-ure mode and effects analysis (FMEA) referencemanual (2nd ed.). AIAG, (810) 358–3003.

DOD, 1980, Military Standard, Procedures forPerforming a Failure Mode, Effects and CriticalityAnalysis. MIL-STD-1629A, U.S. Department of De-fense, Washington, D. C. (Cancelled 4.8.1998).

DOD, 1984, Military Standard, System safety pro-gram requirements. MIL-STD-882B: U.S. Depart-ment of Defense, Washington, D C.

Electronic Industries Association, 1983, SystemSafety Analysis Techniques. Safety EngineeringBulletin No. 3-A, Engineering Department, Wash-ington, D. C.

Electronic Industries Association, 1990, SystemSafety Engineering in Software Development. Safe-ty Engineering Bulletin No. 6-A, Washington, D. C.

IEC, 2001, Analysis techniques for system reliabili-ty—Procedure for failure mode and effects analy-sus (FMEA). IEC 60812, Ed. 2. 56/735/CD,27.4.2001.

IEEE, 1990, Standard Glossary of Software Engi-neering Terminology. IEEE Std 610.12-1990, IEEE,New York.

JPL, 1990. Project Reliability Group ReliabilityAnalyses Handbook. D-5703, Jet Propulsion Labo-ratory/California Institute of Technology, Pasade-na, CA.

JPL, 1998, Earth Observing System MicrowaveLimb Sounder, System–Level Failure Modes, Ef-fects, and Criticality Analysis. JPL D-16088, Ver-sion 1.0, September 1, 1998.

JPL, 1995, Software Failure Modes & Effects Anal-ysis. Software Product Assurance Handbook. JetPropulsion Laboratory D-8642.

MOD, 1993, Draft Defence Standard 00-56: SafetyManagement Requirements for Defence SystemsContaining Programmable Electronics. UK Minis-try of Defence, Feb. 1993.

MOD, 1991, Hazard Analysis and Safety Classifi-cation of the Computer and Programmable Elec-tronic System Elements of Defence Equipment.Interim Defence Standard 00-56, Issue 1, Ministryof Defence, Directorate of Standardization, Kenti-gern House, 65 Brown Street, Glasgow, G2 8EX,UK, April 1991.

NASA, FEAT (Failure Environment Analysis Tool).NASA Software Technology Transfer Center, Cos-mic #MSC–21873 and #MSC-22446.

NASA, FIRM (Failure Identification and RiskManagement Tool). NASA Software TechnologyTransfer Center, Cosmic #MSC-21860.

NRC, 1981, Fault Tree Handbook. W Veseley. Nu-clear Regulatory Commission Washington D.C.1981. NUREG-0942.

RTCA, Inc., 1992, Software Considerations in Air-borne Systems and Equipment Certification.RTCA/DO-178B.

SAE, 1996, Aerospace Recommended Practice: Cer-tification Considerations for Highly-Integrated orComplex Aircraft Systems. ARP 4754.

SAE, 1996, Aerospace Recommended Practice:Guidelines and Methods for Conducting the SafetyAssessment Process on Civil Airborne Systems andEquipment. ARP 4761.

SAE, 2000, Potential Failure Mode and EffectsAnalysis in Design (Design FMEA) and PotentialFailure Mode and Effects Analysis in Manufactur-ing and Assembly Processes (Process FMEA) Refer-ence Manual. SAE J-1739.

SAG, 1996, Failure Modes and Effects Analysisand Failure Modes, Effects, and Criticality Analy-sis Procedures.

System Safety Society, 1993, System Safety Analy-sis Handbook.

36

S T U K - Y TO - T R 1 9 0

APPENDIX 1 EXAMPLE OF THE FMEA-WORKSHEET (IEC 60812)

Indentu

re level:

Shee

t no:

Opera

ting m

od

e:

Desig

n b

y:

Ite

m:

Revis

ion:

FM

EA

Pre

pare

d b

y:

Appro

ved b

y:

Item

ref.

Item

description-

fun

ction

Failu

re e

ntr

y

code

Failu

re

mode

Possib

le

failu

re

causes

Sym

pto

m

dete

cte

d b

y

Lo

cal

eff

ect

Eff

ect

on

un

it o

utp

ut

Com

pe

nsating

pro

vis

ion

again

st

failu

re

Severity

cla

ss

Failu

re

rate

F/M

hr

Da

ta

sou

rce

Recom

me

ndation

s

and a

ctions t

aken

S T U K - Y TO - T R 1 9 0

37

APPENDIX 2 AN EXAMPLE OF A SWFMEA FORM (TÜV NORD)S

ub

syst

em:_

_____

____________

____________

__

Ref

-No

C

om

po

nen

t F

au

lt

Ca

use

F

ail

ure

eff

ect

Sa

fety

failure mode and effects analysis of software-based

Documents