real-time & embedded systems - ritswen-563/slides/c18_-_safetycritical.pdf · robotics stray em...
TRANSCRIPT
![Page 1: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/1.jpg)
Real-Time & Embedded Systems
Agenda
Safety Critical Systems
Project 6 continued
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 2: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/2.jpg)
Safety Critical Systems “Safe enough” looks different at 35,000 feet.
– Bruce Powell Douglass
“The Air Force has a perfect operating record … everything we put in the air has come back down.”
- Unknown
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 3: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/3.jpg)
Ubiquity of Control Systems Electro-mechanical devices are migrating to software-
driven systems
Automobiles
Planes
Home Appliances
Medical Equipment
Nuclear Power Plants
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 4: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/4.jpg)
Software Failures Therac-25
Radiation therapy device
Software-driven
Bugs allowed massive radiation overdoses
Killed 3 people, contributed to the death of a fourth
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 5: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/5.jpg)
Software Failures Patriot Missiles
Clock drift reduced their effectiveness from 95% to 13%
Allowed a SCUD missile through defense perimeter
Killed 29, injured 97
Aegis tracking system
Failure contributed to shooting down an Iranian Airline flight
290 lives lost
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 6: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/6.jpg)
Software Failures 8080-based factory control software
Mistakenly stacked large boulders 80 feet high
Crushed cars and damaged a building
Robotics
Stray EM interference blamed for 19 deaths
Cardiac pacemakers
Low-energy radiation reprogrammed
Caused several deaths
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 7: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/7.jpg)
Software Failures Medical Database Software
Incorrectly informed woman she had incurable syphilis and had passed it on to her children
She strangled one, attempted to kill another and herself
Sunlight Filtering Software
Failed to remove false missile detections based on sunlight reflecting off clouds
A Soviet Commander averted nuclear war based on a “… funny feeling in my gut.”
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 8: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/8.jpg)
Terms Reliability – the measure of up-time, or availability of a
system The probability that a task will complete before the system
fails
Measured in Mean Time Between Failures (MTBF)
Security – permitting access to only authorized and authenticated persons of systems
Safety – does not incur too much risk to person or property
Risk – the chance that something bad will happen
Common-mode failure – a single failure results in the failure of multiple control paths
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 9: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/9.jpg)
Fundamental Hazards Release of energy
Release of toxins
Interference of life-support functions
Supplying misleading information to safety personnel or control systems
Failure to alarm when hazardous conditions arise
Failure to limit or act when unwanted events occur, inputs are flawed or outputs are outside correct levels
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 10: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/10.jpg)
System Issues Safety is a system issue
Multiple solutions may address a concern
Interlocks
Redundant hardware
Redundant software
The interaction of the components determines the safety of the system
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 11: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/11.jpg)
Software Failures Software does not fail
Failures represent a change in the capability of the system
Broken switch
Failed component
Bad sensor
If software does something wrong, it does it every time!
Software may respond poorly to failures
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 12: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/12.jpg)
Single-point Failures A device is considered safe if a single failure in the
system does not result in an unsafe condition
Single-point assessments tree:
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 13: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/13.jpg)
Fail-Safe State A condition a safety-critical system must attain with
an unrecoverable fault.
Emergency Stop
Partial Shutdown
Hold
Manual Control
Restart
Driven by the problem domain needs
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 14: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/14.jpg)
Fail-Safe states An airliner jet engine fails?
Unmanned space vehicle launch?
Attended medical devices?
Hazardous area robotics?
Unmanned aircraft control failure?
Cruise ship rudder failure?
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 15: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/15.jpg)
Achieving Safety Separation of safety channels from non-safety
channels
Firewall pattern
Any component failure in the channel fails the entire channel
Isolation of safety systems from non-safety systems is common and justifiable
Redundancy
Small or large scale
Homogenous or diverse
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 16: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/16.jpg)
Achieving Safety Homogenous
Channels are replicated verbatim
Detects only faults, not errors
Inexpensive
Diverse
A different channel is implemented
Detects faults and errors
More expensive
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 17: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/17.jpg)
Achieving Safety Diverse redundancy is stronger
Protects against systemic faults / errors
Data corruption detection
Parity bit
Hamming codes (parity bits)
Checksums
CRCs
Redundant storage
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 18: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/18.jpg)
Achieving Safety Reasonableness checks
A second algorithm validating the results of the first
Usually much simpler
Feedback error detection Identify potential fault conditions
May cause a fail-safe transition
Feedback error correction Identify and correct potential fault conditions
Attempts to keep the system operating, and may reduce capability
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 19: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/19.jpg)
Safety Architectures Single-Channel Protected Design
A single flow of control
A break in the channel induces a failure
Safeguards are added to ensure correct fail-safe behavior
A single point of failure
Multi-channel Voting Pattern An odd number of redundant channels
Each channel “votes” on the task
Majority rules
Homogenous or diverse
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 20: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/20.jpg)
Safety Architectures Homogenous Redundancy Pattern
Identical channels run in parallel
If an odd number of channels:
Majority channels detect and correct minority channels
Must be fully redundant
Inexpensive to implement
Detects only faults, not errors
May be expensive due to redundant hardware
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 21: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/21.jpg)
Safety Architectures Diverse Redundancy Pattern
Redundant, but uniquely implemented channels
Different but equal
Lightweight redundancy
Separation of monitoring and actuation
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 22: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/22.jpg)
Safety Architectures Watchdog Pattern
A secondary process monitors the primary process
Primary process periodically “feeds” the secondary process
Secondary process can alarm or restart should the primary process fail
May include a periodic test suite
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 23: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/23.jpg)
Safety Architectures Safety Executive Pattern
A centralized coordinator for monitoring safety
A really smart watchdog
Watchdog timeouts
Software error assertions
Continuous or periodic built-in tests
Faults indentified by monitors
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 24: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/24.jpg)
Safety Architecture Monitor-actuator pattern
Separation of algorithms
Actuation performs the actions
Monitoring tracks the actions
Additional cost and complexity
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 25: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/25.jpg)
Eight Steps to Safety Identify the hazards
Determine the risks
Dfine the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test (Peer Reviews!)
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 26: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/26.jpg)
Identify the Hazards Identify the hazard
Determine the level of risk
Determine the tolerance time
Determine the source of the hazrd: The fault leading to the hazard
The likelihood of the fault
The fault detection time
The means by which the hazard is handled: The means
The fault reaction (exposure time)
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 27: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/27.jpg)
Identify the Hazards Patient Ventilator Example:
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 28: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/28.jpg)
Fault Analysis Fault-tree analysis (FTA)
Identify the hazards
Work backward from the hazard to identify the causal conditions
Diagram with a boolean flow chart
UML Activity diagram
Failure mode effect analysis (FMEA)
Identify potential faults
Work forward to the consequences
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 29: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/29.jpg)
Determine the Risks FDA levels of concern
Minor – not expected to result in injury or death
Moderate – results in minor to moderate injury
Major – result in major injury or death
German TUV characterization
(S) Severity of the risk
(E) Duration of the period of exposure
(G) Prevention of the danger
(W) Probability of occurrence
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 30: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/30.jpg)
Determine the Risks German TUV characterization
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 31: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/31.jpg)
Determine the Risks German TUV Example
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 32: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/32.jpg)
Define the Safety Measure Obviation – make the hazard physically impossible
Education – User training
Alarming – Announce the haard so action can be taken
Interlocks – removed via secondary device or logic to interceded
Internal Checking – the system detects and handles the malfunction prior to an incident
Safety Equipment – goggles, gloves, etc
Restriction of access – access to potential hazards is restricted to trained personnel
Labeling – High Voltage, do not touch
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 33: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/33.jpg)
Create Safe Requirements Consider the requirements from a safety perspective
Specify the negations
The system shall not move hardware before user input
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 34: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/34.jpg)
Create Safe Designs Work from safe requirements
Adopt a safe architecture
Revisit, revise the hazard analysis during development
Select measures that provide appropriate levels of detection and correction
Ensure independent channels lack common-mode failures
Adopt consistent strategies for handling faults
Include POST and periodic run-time tests
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 35: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/35.jpg)
Implementing Safety Language Choice
Strong compile-time checking
Strong run-time checking
Support for encapsulation and abstration (but not “just because”)
Exception handling
“Safe” language constructs
Void*?
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 36: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/36.jpg)
Assure the Safety Process Continuously track against hazard analysis
Utilize peer reviews to assure quality
Verify design adherence
Verify coding standards
Identify how each hazard is handled
(c) Copyright 2012 Dr. Phillip A. LaPlante
![Page 37: Real-Time & Embedded Systems - RITswen-563/slides/C18_-_SafetyCritical.pdf · Robotics Stray EM ... Internal Checking – the system detects and handles the malfunction prior to an](https://reader031.vdocument.in/reader031/viewer/2022021622/5b8344007f8b9a7d3a8c7710/html5/thumbnails/37.jpg)
Test, test, test Black box testing
White box testing
Monkey testing
Fault seeding
Load testing
Simulations
System testing
Unit testing
(c) Copyright 2012 Dr. Phillip A. LaPlante