dependable processor design - intranet...
TRANSCRIPT
DEPENDABLEPROCESSOR DESIGN
Matteo CarminatiPolitecnico di Milano - October 31st, 2012
Partially inspired by P. Harrod (ARM) presentation at the Test Spring School 2012 - Annecy (France)
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
OUTLINE
2
What?
Problem StatementPreliminary Definitions
Where?
Interested FieldsStandards
Why?
Pursued ObjectivesState of the Art
How?
Innovative Solutions
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
PROBLEM STATEMENTGuarantee a system to:
• Match specifications
• Fulfill requirements
• Meet constraints
• Provide real-time response
even when faults occur!
We want the system to be dependable.
3Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
RELIABILITY
AVAILABILITY
SAFETY
INTEGRITY
MAINTAINABILITY
TESTABILITY
DEPENDABILITY
4Wha
t
“It is that property of a computer system such that reliance can justifiably be placed
on the service it delivers.”J.C. Laprie [6]
Dependability is an abstraction comprising a plethora of quantities.
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
RELIABILITY
AVAILABILITY
SAFETY
INTEGRITY
MAINTAINABILITY
TESTABILITY
DEPENDABILITY
5
Probability that the system will operate correctly in a specified operating
environment up until time t.
R(t) = P(not failed during [0, t])
Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
RELIABILITY
AVAILABILITY
SAFETY
INTEGRITY
MAINTAINABILITY
TESTABILITY
DEPENDABILITY
6
Probability that the system will be operational at time t.
A(t) = P(not failed at time t)
Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
RELIABILITY
AVAILABILITY
SAFETY
INTEGRITY
MAINTAINABILITY
TESTABILITY
DEPENDABILITY
7
The absence of undesired and unplanned event that results in a specific level of loss (i.e. accident).
Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
RELIABILITY
AVAILABILITY
SAFETY
INTEGRITY
MAINTAINABILITY
TESTABILITY
DEPENDABILITY
8
The absence of impropersystem state alterations.
Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
RELIABILITY
AVAILABILITY
SAFETY
INTEGRITY
MAINTAINABILITY
TESTABILITY
DEPENDABILITY
9
Probability that the system can be repaired until time t.
M(t) = P(repaired during [0, t])
Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
RELIABILITY
AVAILABILITY
SAFETY
INTEGRITY
MAINTAINABILITY
TESTABILITY
DEPENDABILITY
10
The ability to test for certain attributes within a system.
Related to maintainability: importance of minimizing time required to identify and
locate specific problems
Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
These quantities have different, sometimes contradictory, goals: their trade-off is to be
maximized while designing a new electronic system.
11Wha
t
ROBUSTNESSAbility of a system to
continue functioning despite the presence of faults, even if
the system performance may be altered (always in a safe way), until the faults are
corrected.
FUNCTIONALSAFETY
Absence of unreasonable risk due to hazards caused by malfunctioning behavior
of electronic systems.
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
WHAT IS A FAULT?
12Wha
t Fault Free Latency Fault Free
Fault Error Detection
RepairRecovery
Outage
t
FAULT a defect within the systemERROR a deviation from the required operationFAILURE the system fails to perform its required function
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
FAULT TAXONOMY
13Wha
tFAULT
SYSTEMATICRANDOM
HW SWHW
PERMANENT (hard): shorts, stuck-at, stuck-open
INTERMITTENT
TRANSIENT (soft): SEE, SBU, MBU, SET
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
AccidentHazard
WHAT IS AN ACCIDENT?
14Wha
tFault Error Failure
Fault Error Failure
Fault Error Failure
state of the system that in certain environmental situations
may lead to an accident
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 15
SFF
<60%
>60% && <90%
>99%
-
SIL 1
SIL 2SIL 3
Wha
tSAFETY INTEGRITY LEVEL
The relative level of risk reduction provided by a safety function.
SIL - IEC 61508
HFT
>90% && <99%
0 1 2
SIL 2
SIL 3SIL 4
SIL 1
SIL 3
SIL 4SIL 4
SIL 2
Safe Failure FractionRatio between the sum ofsafe hazards plus detecteddangerous hazards and thesum of safe hazards plusall dangerous hazards.
Hardware Fault ToleranceA HFT on N means that
N+1 faults could cause a lossof the safety function.
From [1].
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
FAULT-RELATED PROPERTIES
• Fault IgnoreThe fault does not require to be detected nor mitigated.
• Fault DetectionThe result can be incorrect, but the fault must be identified.
• Fault ToleranceThe fault is to be mitigated and the provided result correct.
• Fault DiagnosisThe result must be correct and the faulty unit is to be identified.
16Wha
t
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
CRITICAL SYSTEMS• MISSION
AerospaceRailway
• SAFETY
Nuclear power stationsMedical devicesAutomotive
• BUSINESS
Account managementTransaction systems
17Whe
re
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
CRITICAL SYSTEMS• MISSION
Aerospace - DO-178B/DO-254 Railway - EN 50128
• SAFETY
Nuclear power stations - IEC 60880Medical devices - IEC 60601Automotive - ISO 26262
• BUSINESS
Account managementTransaction systems
18Whe
re
STANDARDS
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
ISO 26262: FOCUS
19Whe
re
Breaking: ABS, anti-skid, ...Engine management, power train
Driver assistant,lane departure
Passenger safetyair bags
Electric/hybridenergy system
From [1].
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
ISO 26262: FOCUS
20Whe
re
REQUIREMENTS
•Architecture complianceMeasures to achieve system safety in case of random HW failures.
•Process complianceGuidelines for designing processes and HW/SW architectures to avoid systematic failures.
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
ISO 26262: FOCUS
21Whe
re
IEC 61508 ISO 26262 Application Example
SIL 4
SIL 3
SIL 2SIL 1
-ASIL D
ASIL CASIL BASIL A
Railway signal control
Brake-by-wire, EPS, ...
Battery management
Rear lightsAutomotive dashboard
ASILAUTOMOTIVE SAFETY
INTEGRITY LEVEL
Ex: ASIL D means >99% faults must be detected and the probability of violationof safety goal due to HW random failures shall be less than 10 FIT
(1 FIT = 1 failure in1 billion of hours)
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
ISO 26262: FOCUS
22Whe
reHOW TO ACHIEVE THE ASIL
Setting up functionalsafety management
Defining safety goal
Improving the process Improving the product
Avoid systematic failures
Detect/tolerateHW random failures
Avoid/detectdependent failures
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
NON-CRITICAL SYSTEMS
23Whe
re
• Domestic appliance
• Entertainment devices
• Distribution networks
• Wellness
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
NON-CRITICAL SYSTEMS
• Domestic appliance
• Entertainment devices
• Distribution networks
• Wellness
24Whe
re
Dependability
Performance Power
TRADE-OFF
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
GOALS
25Why
Design dependable processors to:
• Reduce the number hazards and accidents
• Increase system safety
• Meet standards
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
STATE OF THE ART
26Why
ARCHITECTURAL APPROACH
AA
B
A
B
A
B
C VOTI
NG
1oo1
Rs = Ra
2oo2
Rs = 1 - (1 - Ra) x (1 - Rb)
1oo2
Rs = Ra x Rb2oo3
Rs = 1 - (1 - RaRb) x(1 - RbRc) x (1 - RaRc)
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
STATE OF THE ART
27Why
DIVERSITY“different solutions satisfying the same requirement
with the aim of independence” - ISO 26262
A B
• Reduces HW systematic failures
• Prevents, reduces, or detects common cause failures replacing the need for complex measures
ACh
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
DUAL CORE LOCK-STEP
28Why
Homogeneous Redundancy
CPUmaster
CPUcheckerCOMP
SW
• High diagnostic coverage• Negligible SW overhead
• Significant HW overhead• Significant power consumption
increase• Susceptible to common-cause
and HW systematic failures• Poor diagnostic info and availability
Achieves ASIL D
ExampleFreescale MPC5746M
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
CHALLENGE & RESPONSE
29Why
SW cross-exchange between 2 independent units
• Common-cause failures detection
• HW/SW systematic failures detection
• Significant HW and power consumption increase
• Significant SW and performance overhead
• Poor transient fault coverage, reusability, and availability
• Slow error detection latency
Achieves ASIL CCPUmain
CPUsecondary
SW1
serial interface
SW2
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
E-GAS CONCEPT
30Why
SW diversified redundancy with 2 independent units
Why
• Low HW overhead• SW systematic failures
detection
• Significant SW and performance overhead
• Poor transient fault coverage, reusability, and availability
• Susceptible to common-cause failures
• Slow error detection latency
Achieves ASIL BCPUmain MISR
SW1 SW2
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
ExampleLEON3 FT
HARDENED BY-DESIGN
31WhyEach processor functional units is independently hardened
Why
• Low HW overhead• Low performance
overhead• Optimized solution
• Significant design overhead• Need to know processor internal
description
• Very low reusability, very specific solution
hardenedCPU
SW
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 32HowTIGHTLY COUPLED 2 CORE
Asymmetric Redundancy
CPUmaster
SW
• Low HW and power consumption overhead
• Negligible SW overhead• Common-cause and HW
systematic failures detection• Fast error detection
latency, good availability
• Very detailed analysis required• CPU interface needed
Achieves ASIL Doptimizedsupervisorch
ecke
r
CPU interface
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 33HowYOGITECH’S FAULT-ROBUST
• The supervisor is designed exploiting a white-box approach• Meets IEC 61508 and ISO 26262 requirements• One main supervisor for the CPU and a set of remote
supervisors, one for each specific region of the system• Hardware-centric approach
CPUmaster
SW
main supervisor
CPU interface rem
ote
supe
rviso
rsrobustnet
From [4].
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 34How
The CPU Checking Unit checks the instructions execution, the program flow, and the data processing
• The CPU sniffer collects, compacts, codes, and buffers signals from the CPU and forwards them to the supervisors
• Each supervisor is composed by: a data-path, a sequencer and a checker
The System Control Unit decides if the system is in a wrong state and performs necessary actions
MAIN SUPERVISOR
CPUinterface
robustnet
System Control Unit
CPU Checking Unit
main supervisor
CPUinterface
CPU checking unit
CPU
sniffe
r
Dat
a su
p.
Mod
e su
p.In
struc
t. exe
c. su
p.D
ata
addr
. sup
.
system control unit
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 35How
• The memory supervisor provides the possibility to store ECC codes and to share it with multiple memories
• Peripheral supervisors implement a hardware verification component
• The bus supervisors monitor sources and sinks of the bus and perform data integrity checks
REMOTE SUPERVISORS
robustnet
Memory Supervisors
remote supervisors
Peripheral Supervisors
Bus Supervisors
Custom Supervisors
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 36How
METHODOLOGY FLOWSafety
Requirements Specification
Failure Modes and Effects
Analysis
FaultInjection
SFF/DCreports
Safe Failure Fraction - SFFDiagnostic Coverage - DC
DESIGN
supported byautomatic tools
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 37How
RESULTS
• PHILIPS SJA2510 FlexRay microcontroller
• <30% HW overhead for CPU protection
• <10% HW overhead for memory protection
• A greater level of optimization can be reached if the configuration is more application specific
From [4].
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
CONCLUSIONS
38
OLD TREND NEW TREND
• HW is unreliable by definition• HW is stupid: in case of failure
it cannot tell what is going on• HW/SW redundancy is the
only way to guarantee the availability of safety-critical systems
• Design-for-Uncertainty must become a new paradigm
• FMEA till the gate level should become a de facto standard
• New architectures should embed methods to detect and control errors
The proposed platform-based solution aims at reducing of HW and SW costs needed to implement fault robust MCUs in adherence with IEC 61508 SIL3.This is achieved by implementing an optimized HW CPU fault detection, by providing dedicated HW to replace, support or supplement SW tests and by distributing robustness to the whole SoC.The proposed approach is scalable, flexible, portable and reusable by design.
DEPENDABLE PROCESSOR DESIGN - Matteo Carminati
BIBLIOGRAPHY
39
1. P. Harrod, Dependable Processor Design - TSS presentation, 2012
2. C. Bolchini, Dependable Systems - Course slides, 2012
3. M. Bellotti, R. Mariani, How future automotive functional safety requirements will impact microprocessors design - Microelectronics Reliability, 2010
4. R. Mariani, P. Fuhrmann, B. Vittorelli, Fault-robust microcontrollers for automotive applications, IEEE International On-Line Testing Symposium (IOLTS), 2006
5. M. Baleani, A. Ferrari, L. Mangeruca, A. Sangiovanni-Vincentelli, M. Peri, S. Pezzini, Fault-Tolerant Platforms for Automotive Safety-Critical Applications, International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), 2003
6. J.C. Laprie, Dependable Computing: Concepts, Limits, Challenges, IEEE International Symposium on Fault-Tolerant Computing, 1995