online and operand-aware detection of failures utilizing false alarm vectors amir yazdanbakhsh,...
TRANSCRIPT
![Page 1: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/1.jpg)
Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors
Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung Kim
Department of Electrical and Computer Engineering
![Page 2: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/2.jpg)
2
Introduction• Technology scaling beyond 32nm degrades the
manufacturing yieldo Can be addressed by imposing restrictive design rules or just
using regular fabrics[T. Jhaveri, SPIE’06]
o Can be addressed by using configurable logic blocks to make post-silicon corrections[Y. Ran, TVLSI’06]
o Redundancy based techniques can also be usedo Exploiting existing redundancy in high performance processors
[P. Shivakumar, ICCD’12][S. Shyam, ASPLOS’06][J. Srinivasan, ISCA’05]o Incorporate redundancy at the granularity of a bit slice
[K. Namba, PRDC’05]
![Page 3: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/3.jpg)
3
Motivation
C7 C4
01234567
Defective prefix nodeimpacts C4 and C7
0000…11010…0110
Minimized vectors
Recover
Checker
• The checker detects a match with the faulty vectors and a small number of False Alarm vectors at runtime.
![Page 4: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/4.jpg)
4
Contributions
Checker Unit Module
False Alarm Vectors
Flexible option for online and operand-level fault detection Update faulty vectors over the time TCAM-based implementation which can store cubes with
don’t care No extra logic on the critical paths
Efficient use of false alarm vectors to reduce the number ofvectors to be checked, thus reducing the TCAM area
Integrate the false alarm insertion into ESPRESSO 2-level logicminimization tool
The recovery flag is not falsely activated too frequently
![Page 5: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/5.jpg)
5
Checker Unit: Comparison with A Redundancy-based Alternative• Checker Unit • Redundancy-based
Recover
0xxx11x0
TCAM
Operand Checker
Does not affect the critical path
Flexible checker unit (can update faulty vectors)
Online and operand-aware detection of failures
⤫ Affects the critical path (large muxes)
⤫ Fixed design approach (can not be updated)
⤫ Two out of three adders should always be fault-free
![Page 6: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/6.jpg)
6
Overview of TCAM• TCAM can store test cubes which have don’t care bits• Conventional TCAM needs to support random access to a
specific entry to update the key value at runtime– Requires a log N-to-N decoder for a TCAM with N entries– The checker unit does not need such a decoder
• Each entry must be updated only once, every time the chip is turned on• Supporting a sequential access to write the test cubes to the TCAM is
sufficient
• In our framework the size of TCAM can get impractically large if all the faulty are individually stored– We propose to a few false alarm vectors to reduce number of
entries in the TCAM and therefore reduce the TCAM size
![Page 7: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/7.jpg)
7
False Alarm Insertion to Minimize the TCAM Size: Example
A
B
CD
E
A B C D E
x 1 0 x x0 1 1 0 x
1 1 1 0 0
1 1 1 0 1
x 0 0 x 1
x 0 1 0 1
V1
V2
V3
V4
V5
V6
Identify cubes which excite fault
![Page 8: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/8.jpg)
8
A B C D E
x 1 0 x x0 1 1 0 x
1 1 1 0 0
1 1 1 0 1
x 0 0 x 1
x 0 1 0 1
V1
V2
V3
V4
V5
V6
A B C D E
x x 0 x 1x 1 0 x x
x x x 0 1
x 1 x 0 x
V1
V2
V3
V4
Identify cubes which excite fault
Test cube minimization
False Alarm Insertion to Minimize the TCAM Size: Example
![Page 9: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/9.jpg)
9
A B C D E
x x 0 x 1x 1 0 x x
x x x 0 x
V1
V2
V3
A B C D E
x x 0 x 1x 1 0 x x
x x x 0 1
x 1 x 0 x
V1
V2
V3
V4
• We reduce the number of test cubes from 6 to 3
Identify cubes which excite fault
Test cube minimization
Further minimization with
False Alarm Insertion
False Alarm Insertion to Minimize the TCAM Size: Example
![Page 10: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/10.jpg)
10
False Alarm Insertion
Problem Definition• Reduce the number of cubes beneath the given
threshold by adding as few false alarm vectors as possible
Why we need False Alarms?• Due to area budget, number of entries in TCAM is
limited• The number of test cubes translates to the number of
entries in the TCAM
![Page 11: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/11.jpg)
11
Using Two-Level Logic Minimization
• Two-level logic minimization can be used to minimize the number of test cubes
• We expand the ESPRESSO* tool by inserting false alarm vectors to achieve higher minimization
*ESPRESSO. http://embedded.eecs.berkeley.edu/pubs/downloads/espresso/.
![Page 12: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/12.jpg)
12
False Alarm Insertion by Extending ESPRESSO
F = IRREDUNDANT (FON, FDC)F = REDUCE (FON, FDC)F = EXPAND (FON, FOFF)
Stop Minimization
?
Test cubes
Minimized test cubes
Overview of the main loop of ESPRESSO
F = EXPAND-FA (FON, FOFF)F = IRREDUNDANT (FON, FDC)F = REDUCE (FON, FDC)F = EXPAND (FON, FOFF)
# vectors <
threshold
Minimized cubes
Minimized cubes with false alarm
Extension with False Alarm insertion
![Page 13: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/13.jpg)
13
False Alarm Insertion Example
EXPAND-FA IRREDUNDANT
REDUCE EXPAND
![Page 14: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/14.jpg)
14
False Alarm Insertion Procedure
• Each call EXPAND-FA function expands multiple test cubes– How I sequentially go through the on-set?– Look at the paper– Which cube is selected to be expanded?– Same section – stopping criteria (when you
reach the target number of cubes)
![Page 15: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/15.jpg)
15
False Alarm Insertion for One CubeA0 A1 A2 A3
0 0 0 xON1
x x 1 x
x 1 x x
1 x x 1
OFF1
OFF2
OFF3
Offset Matrix
False Alarm Matrix
0 0 2 -
0 2 0 -
1 0 0 -
B1
B2
B3
1 2 2 -
1 1 0 0
0 0 0 0
0 0 0 0
1 0 0 0
OFF1
OFF2
OFF3
ON1
ON2
ON’2
• False Alarm Matrix (i, j)– Entry (i, j) indicates false alarms between the off-set cube i
and (the expanded) cube when literal j is dropped
A0A1
A2A3
![Page 16: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/16.jpg)
16
Simulation Configuration
• Single-failure scenarios in various nodes of32-bit Brent-Kung adder (prefix adder)
• Generate all the test vectors for two failing cases modeled by a stuck-at-0 and stuck-at-1 using ATALANTA* ATPG toolset
• Using SPEC2006 suite for workload-dependent case• Record the input arguments to the adder by running each
benchmark on an X86 simulator
• Analyzing area overhead in 2-issue and 4-issue microprocessors
*H.K. Lee and D.S. Ha. Atalanta: an efficient ATPG for combinational circuits. Technical Report;
Department of Electrical Engineering, Virginia Polytechnic Institute and State University, pages 93
12, 1993.
![Page 17: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/17.jpg)
17
Comparison of Probability of Detection• Probability of detection: percentage of times that the checker unit
activates the recovery signal (could be false alarm or true positive)
Average PoD degrades with decrease in the number of test cubes Average PoD after inserting false alarms does not degrade
significantly in FA-128 or FA-64 or FA-32 compared to W/O FA This behavior is true for both workload-dependent and random cases
![Page 18: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/18.jpg)
18
Comparison of False Alarm Insertion Algorithms• FA-Ag Algorithm
– At each iteration, all the cubes are expanded using the expand-FA procedure.
• Each entry indicates the fraction of false alarms from the total number of detection ( )– FA denotes the number fo false alarm minterms and TP the number of true
positive when a fault is truly happening.
On-average FA-Ag results in more overhead with increase in the number of test cubes compared to FA
![Page 19: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/19.jpg)
19
Area Overhead
• Implemented approaches– Baseline k+1 (for k=2 , 4)
• K-issue processor with 1 redundant component
– K+TCAM• K-issue processor with checker implemented as TCAM
– K+FPGA• K-issue processor with checker implemented as FPGA
• 2+TCAM has better area than 2+1 for 32 and 48 cubes• 2+FPGA always has more area than baseline• Similar behavior for 4+TCAM and 4+FPGA
![Page 20: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/20.jpg)
20
Conclusion
A new framework for online detection of failures at operand level of granularity
Design a flexible TCAM-based checker unit Propose a false alarm insertion algorithm to reduce the
number of vectors below the given threshold Incorporate the false alarm insertion algorithm into
ESPRESSO 2-level logic minimization tool Future works:
Use checker unit for other existing modules inside the processor Utilizing the online and operand-aware detection for other type of
faults such as delay path fault
![Page 21: Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung](https://reader036.vdocument.in/reader036/viewer/2022062519/5697c0211a28abf838cd2cde/html5/thumbnails/21.jpg)
21
Questions?