template design © 2008 integer alu2 dec 1 id/ exe stage exe/ mem stage reg file d-cache pc mem/ wb...
TRANSCRIPT
![Page 1: TEMPLATE DESIGN © 2008 Integer ALU2 DEC 1 ID/ EXE Stage EXE/ MEM Stage Reg File D-Cache PC MEM/ WB Stage IF/ID Stage I-Cache](https://reader038.vdocument.in/reader038/viewer/2022110321/56649f3f5503460f94c5f99f/html5/thumbnails/1.jpg)
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
Integer
ALU2
DEC 1
ID/EXEStage
EXE/MEMStage
Reg File
D-Cache
PC
MEM/WBStage
IF/IDStage
I-Cache
DEC 2
InstructionScheduler
Enable Lines
Diagnosing Intermittent Faults Using Software Techniques Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan
The University of British Columbia
Intermittent Faults
Research Objective
Diagnosis Technique Goals Overview of the Diagnosis Approach Isolate Fault-Prone Unit
Intermittent hardware faults are bursts of errors that occur at the same location and last from a few cycles to a few seconds.
Intermittent faults will be a significant concern in future processors.
Transient Fault Intermittent Faults
mov R1, #5
mov R2, #6
mov R3, #7
ld R4, R1, Array_Addr
ld R5, R2, Array_Addr
ld R6, R3, Array_Addr
mult R7, R5, R4
Failure
Program Execution
time
Research Motivation
Diagnosis is vital in guiding fine-grained recovery techniques (e.g., hardware reconfiguration) and hence facilitating processor degraded performance.
Chip
Core 1
Core 2
Core 3
Core
4
Core 5
Core
6Core
7
Chip
Core 1
Core
2Core
3Core
4
Core
5Core
6Core
7Core
8
Chip
Core 1
Core 2
Core 3
Core
4
Core 5
Core
6Core
7Core
8
If core 8 malfunctions, then two possiblerecovery options would be available:
1.The whole core 8 is disabled without fine-grained diagnosis, or
2. Part of core 8 is disabled with fine-grained diagnosis.
Requires no hardware support, Provides formal guarantees of correctness and
completeness, Scalable, Few false positives.
Modeling Intermittent Faults Impact on Programs - Example
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
Modeling Intermittent Faults Impact on Programs - Results
The DDG model is more than two orders of magnitude faster than equivalent fault-injection experiments.
89 to 93% of the faults' crash distances are within 100 nodes.
Crash Model
Dynamic Dependency
Graph
Fault Model
Expected IPS and
CD
SimpleScalarSimulator
Actual IPS and
CD
Intermittent Error
Program Crash/Error Detected
Crash Dump File(e.g., crash state and inputs)
• Run Fault-Free• Construct DDG• Diagnose Error
FaultyInstructions
Overview of the Diagnosis Approach - Example
Identify Erroneous Data
An intermittent fault affected 14-18,
Crash instruction: 27,
Erroneous data: 14, 17, 16, 19 and 21.
Expected fault spans over nodes 14-19.
Actual fault affected nodes 14-18.
Array_Addr
#5 #6 #7
.
.
.
Intermittent Error
4 5 6
1 2 3
7
Identify Faulty Instructions
Filtering
Isolate Fault-Prone Unit
Potential Hardware Support
Operating Systems Directions
Contact Information
Layali Rashid
PhD CandidateDepartment of Electrical and Computer EngineeringThe University of British [email protected]
Isolate Instructions First Affected by the Fault
3
Identify Instructions that Change Erroneous Data
2
1
Of the intermittent faults that are non-benign, 95% result in a program crash.
91 to 95% of the faults cause program to crash within 300 nodes of the fault’s start.
Conclusions
Diagnosis is vital in guiding fine-grained recovery.
Diagnosing intermittent faults using software techniques is possible.
Most intermittent faults cause program to crash shortly after the fault’s start.
Use Dynamic Dependency Graph (DDG).
Map tasks to cores based on the core's functioning units and the task's requirements.
Modify a program on the fly to avoid using malfunctioning units.
Provide feedback to instruction scheduler about the malfunctioning units, such that minimal performance overhead is encountered.
Integer
ALU1
Back trace erroneous data in DDG.